To see search results, type here and hit `Enter`
Faster Results
Secure Torrenting
Unlimited Search Results with detailed torrent info
1-YR Subscription to CyberGhost VPN (PRO+VPN only)

I have taken data of World Cup from kaggle.com and uploaded the data on github.com, and from github.com I am pulling the data and reading it in Jupyter notebook. I am trying to understand to potential parameters in dataset based on which I can interpret data in different dimensions. After deciding on different parameters I started with cleaning data and filtering data which will be useful for analysis.I use different tools/module like Numpy, Pandas, Matplotlib, Seaborn, OS, urllib, Plotly and many inbuilt features.
I have taken data from kaggle.com of World Cup from 1986 to 2022 (10 Years of data)
!pip install jovian opendatasets --upgrade --quiet
!pip install plotly
Collecting plotly
Downloading plotly-5.15.0-py2.py3-none-any.whl (15.5 MB)
|████████████████████████████████| 15.5 MB 20.1 MB/s
Requirement already satisfied: packaging in /opt/conda/lib/python3.9/site-packages (from plotly) (21.2)
Collecting tenacity>=6.2.0
Downloading tenacity-8.2.2-py3-none-any.whl (24 kB)
Requirement already satisfied: pyparsing<3,>=2.0.2 in /opt/conda/lib/python3.9/site-packages (from packaging->plotly) (2.4.7)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.15.0 tenacity-8.2.2
Let's begin by downloading the data, and listing the files within the dataset.
import os
import pandas as pd
import numpy as np
from urllib.request import urlretrieve
project_name = "worldcup-course-project-meetth-new" # change this (use lowercase letters and hyphens only)
!pip install jovian --upgrade -q
urlretrieve('https://raw.githubusercontent.com/meetth77/jovian_practice/main/data_wcmatches','./new.csv')
('./new.csv', <http.client.HTTPMessage at 0x7f06bf295e20>)
#checking if new file is created or not
os.listdir('./')
['.git', '.jovianrc', 'zerotopandas-course-project-Copy5.ipynb', '.ipynb_checkpoints', 'new.csv']
#Importing data
data_df=pd.read_csv('./new.csv')
#Looking at data and getting to know different columns and type of data in the columns also number of rows and columns in data
data_df
| home_team | away_team | home_score | home_xg | home_penalty | away_score | away_xg | away_penalty | home_manager | home_captain | ... | Attendance | Venue | Round | Date | Score | Notes | Host | Year | Unnamed: 20 | Unnamed: 21 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | France | 3 | 3.3 | 4.0 | 3 | 2.2 | 2.0 | Lionel Scaloni | Lionel Messi | ... | 88966 | Lusail Iconic Stadium, Lusail | Final | 12/18/2022 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | Qatar | 2022 | NaN | NaN |
| 1 | Croatia | Morocco | 2 | 0.7 | NaN | 1 | 1.2 | NaN | Zlatko Dalić | Luka Modrić | ... | 44137 | Khalifa International Stadium, Doha | Third-place match | 12/17/2022 | 2–1 | NaN | Qatar | 2022 | NaN | NaN |
| 2 | France | Morocco | 2 | 2.0 | NaN | 0 | 0.9 | NaN | Didier Deschamps | Hugo Lloris | ... | 68294 | Al Bayt Stadium, Al Khor | Semi-finals | 12/14/2022 | 2–0 | NaN | Qatar | 2022 | NaN | NaN |
| 3 | Argentina | Croatia | 3 | 2.3 | NaN | 0 | 0.5 | NaN | Lionel Scaloni | Lionel Messi | ... | 88966 | Lusail Iconic Stadium, Lusail | Semi-finals | 12/13/2022 | 3–0 | NaN | Qatar | 2022 | NaN | NaN |
| 4 | Morocco | Portugal | 1 | 1.4 | NaN | 0 | 0.9 | NaN | Hoalid Regragui | Romain Saïss | ... | 44198 | Al Thumama Stadium, ath-Thumāma | Quarter-finals | 12/10/2022 | 1–0 | NaN | Qatar | 2022 | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 599 | Argentina | Korea Republic | 3 | NaN | NaN | 1 | NaN | NaN | Carlos Bilardo | Diego Maradona | ... | 60000 | Estadio Ol�mpico Universitario, Mexico City | Group stage | 6/2/1986 | 3–1 | NaN | Mexico | 1986 | NaN | NaN |
| 600 | Morocco | Poland | 0 | NaN | NaN | 0 | NaN | NaN | Jose Faria | Ezzaki Badou | ... | 19900 | Estadio Universitario, Monterrey | Group stage | 6/2/1986 | 0–0 | NaN | Mexico | 1986 | NaN | NaN |
| 601 | Spain | Brazil | 0 | NaN | NaN | 1 | NaN | NaN | Miguel Munoz | José Antonio Camacho | ... | 35748 | Jalisco, Guadalajara | Group stage | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | NaN | NaN |
| 602 | Canada | France | 0 | NaN | NaN | 1 | NaN | NaN | Tony Waiters | Bruce Wilson | ... | 65500 | Nou Camp - Estadio Le�n, Leon | Group stage | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | NaN | NaN |
| 603 | Bulgaria | Italy | 1 | NaN | NaN | 1 | NaN | NaN | Ivan Vutsov | Georgi Dimitrov | ... | 96000 | Estadio Azteca, Mexico City | Group stage | 5/31/1986 | 1–1 | NaN | Mexico | 1986 | NaN | NaN |
604 rows × 22 columns
#Removing unwanted columns for data
data_df.pop('Unnamed: 20')
data_df.pop('Unnamed: 21')
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
599 NaN
600 NaN
601 NaN
602 NaN
603 NaN
Name: Unnamed: 21, Length: 604, dtype: object
#Getting columns list of data in order to arrange data in specific manner
data_df.columns
Index(['home_team', 'away_team', 'home_score', 'home_xg', 'home_penalty',
'away_score', 'away_xg', 'away_penalty', 'home_manager', 'home_captain',
'away_manager', 'away_captain', 'Attendance', 'Venue', 'Round', 'Date',
'Score', 'Notes', 'Host', 'Year'],
dtype='object')
#Arranging the data from data_df in specific format for easier understanding
matches_df=data_df[[
'Year',
'Host',
'Date',
'Round',
'home_team',
'away_team',
'home_penalty',
'away_penalty',
'home_score',
'away_score',
'Score',
'Notes',
'Attendance',
'Venue',
'home_xg',
'away_xg',
'home_manager',
'away_manager',
'home_captain',
'away_captain',
]]
#Replacing west germany with germany
matches_df=matches_df.replace(['West Germany'],'Germany')
#checking the west germany is present in data or not
matches_df[matches_df.home_team=='West Germany']
| Year | Host | Date | Round | home_team | away_team | home_penalty | away_penalty | home_score | away_score | Score | Notes | Attendance | Venue | home_xg | away_xg | home_manager | away_manager | home_captain | away_captain |
|---|
# Count of Empty values or null values in dataset
matches_df.isna().sum()
Year 0 Host 0 Date 0 Round 0 home_team 0 away_team 0 home_penalty 570 away_penalty 570 home_score 0 away_score 0 Score 0 Notes 550 Attendance 0 Venue 0 home_xg 476 away_xg 476 home_manager 0 away_manager 0 home_captain 320 away_captain 320 dtype: int64
#getting info related to new data frame like null values, data type of columns
matches_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 604 entries, 0 to 603 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 604 non-null int64 1 Host 604 non-null object 2 Date 604 non-null object 3 Round 604 non-null object 4 home_team 604 non-null object 5 away_team 604 non-null object 6 home_penalty 34 non-null float64 7 away_penalty 34 non-null float64 8 home_score 604 non-null int64 9 away_score 604 non-null int64 10 Score 604 non-null object 11 Notes 54 non-null object 12 Attendance 604 non-null int64 13 Venue 604 non-null object 14 home_xg 128 non-null float64 15 away_xg 128 non-null float64 16 home_manager 604 non-null object 17 away_manager 604 non-null object 18 home_captain 284 non-null object 19 away_captain 284 non-null object dtypes: float64(4), int64(4), object(12) memory usage: 94.5+ KB
# Cleaned and arranged data is written into new csv file
matches_df.to_csv('./wc_cleaned.csv')
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
Data Analysis on FIFA World Cup taking numerous factors into consideration. Data Analyzed is past 10 years of World Cup data ie form 1986 to 2022.
#No of none null values in Notes column
matches_df.Notes.notna().sum()
54
#No. of unique values in Notes column
len(pd.unique(matches_df.Notes))
24
#Getting unique no of team or participants in data
teams_df=matches_df.home_team.to_list()+matches_df.away_team.to_list()
teams_df=pd.DataFrame(teams_df)
# Number of uniques team participated in world cup since 1986
teams=[]
teams=matches_df.home_team.to_list()+matches_df.away_team.to_list()
teams=set(teams)
teams=list(teams)
'Total Number of matches played in world cup from the year 1986 to 2022 are: {}.'.format(matches_df.home_team.count())
'Total Number of matches played in world cup from the year 1986 to 2022 are: 604.'
'Number of teams/countries participated in world cup from the year 1986 to 2022 are: {}.'.format(len(teams))
'Number of teams/countries participated in world cup from the year 1986 to 2022 are: 77.'
year_wise_df=matches_df.groupby('Year')
year_wise_df.sum()
| home_penalty | away_penalty | home_score | away_score | Attendance | home_xg | away_xg | |
|---|---|---|---|---|---|---|---|
| Year | |||||||
| 1986 | 11.0 | 10.0 | 74 | 58 | 2394031 | 0.0 | 0.0 |
| 1990 | 14.0 | 14.0 | 67 | 48 | 2516215 | 0.0 | 0.0 |
| 1994 | 8.0 | 10.0 | 83 | 58 | 3587538 | 0.0 | 0.0 |
| 1998 | 11.0 | 9.0 | 98 | 73 | 2903477 | 0.0 | 0.0 |
| 2002 | 6.0 | 7.0 | 89 | 72 | 2705337 | 0.0 | 0.0 |
| 2006 | 10.0 | 11.0 | 86 | 61 | 3352605 | 0.0 | 0.0 |
| 2010 | 9.0 | 5.0 | 76 | 69 | 3178856 | 0.0 | 0.0 |
| 2014 | 14.0 | 12.0 | 81 | 90 | 3429873 | 0.0 | 0.0 |
| 2018 | 12.0 | 14.0 | 91 | 78 | 3031768 | 95.4 | 71.3 |
| 2022 | 15.0 | 11.0 | 101 | 71 | 3404252 | 88.4 | 81.3 |
As we can see form dataset that data for home_xg(home expected goal) and away_xg (away expected goal) is available in 2018 and 2022 because of technology advancement, forecasting was possible.
#Getting years from data
years=pd.unique(matches_df.Year)
years
array([2022, 2018, 2014, 2010, 2006, 2002, 1998, 1994, 1990, 1986])
#getting no of teams participating in each year
team_years=[]
for i in years:
team_years_home=matches_df[matches_df.Year==i].home_team
team_years_away=matches_df[matches_df.Year==i].away_team
team_years_append=team_years_home.append(team_years_away)
team_years.append(len(pd.unique(team_years_append)))
team_years=team_years[::-1]
team_years
[24, 24, 24, 32, 32, 32, 32, 32, 32, 32]
#Getting unique list of host countries
host_countries=pd.unique(matches_df.Host)
host_countries
array(['Qatar', 'Russia', 'Brazil', 'South Africa', 'Germany',
'Korea Republic, Japan', 'France', 'United States', 'Italy',
'Mexico'], dtype=object)
#grouping the data based on years and making a new dataframe to store of year wise statistics
#Some year wise statistics
year_wise_data=pd.DataFrame()
year_wise_data['No_of_matches']=year_wise_df.count().Attendance
year_wise_data['No_of_teams']=team_years
year_wise_data['Attendance']=year_wise_df.sum().Attendance
year_wise_data['Average_attendance']=year_wise_data['Attendance']/year_wise_data['No_of_matches']
year_wise_data['Total_home_goals']=year_wise_df.sum().home_score
year_wise_data['Total_away_goals']=year_wise_df.sum().away_score
year_wise_data['Total_home_penalty']=year_wise_df.sum().home_penalty
year_wise_data['Total_away_penalty']=year_wise_df.sum().away_penalty
year_wise_data['Host']=host_countries[::-1]
ywd=year_wise_data
ywd['normal_goals']=ywd.Total_home_goals+ywd.Total_away_goals
ywd['Total_penalty']=ywd.Total_home_penalty+ywd.Total_away_penalty
ywd['Total_goals']=ywd['Total_penalty']+ywd['normal_goals']
ywd['Years']=ywd.index
ywd
| No_of_matches | No_of_teams | Attendance | Average_attendance | Total_home_goals | Total_away_goals | Total_home_penalty | Total_away_penalty | Host | normal_goals | Total_penalty | Total_goals | Years | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | |||||||||||||
| 1986 | 52 | 24 | 2394031 | 46039.057692 | 74 | 58 | 11.0 | 10.0 | Mexico | 132 | 21.0 | 153.0 | 1986 |
| 1990 | 52 | 24 | 2516215 | 48388.750000 | 67 | 48 | 14.0 | 14.0 | Italy | 115 | 28.0 | 143.0 | 1990 |
| 1994 | 52 | 24 | 3587538 | 68991.115385 | 83 | 58 | 8.0 | 10.0 | United States | 141 | 18.0 | 159.0 | 1994 |
| 1998 | 64 | 32 | 2903477 | 45366.828125 | 98 | 73 | 11.0 | 9.0 | France | 171 | 20.0 | 191.0 | 1998 |
| 2002 | 64 | 32 | 2705337 | 42270.890625 | 89 | 72 | 6.0 | 7.0 | Korea Republic, Japan | 161 | 13.0 | 174.0 | 2002 |
| 2006 | 64 | 32 | 3352605 | 52384.453125 | 86 | 61 | 10.0 | 11.0 | Germany | 147 | 21.0 | 168.0 | 2006 |
| 2010 | 64 | 32 | 3178856 | 49669.625000 | 76 | 69 | 9.0 | 5.0 | South Africa | 145 | 14.0 | 159.0 | 2010 |
| 2014 | 64 | 32 | 3429873 | 53591.765625 | 81 | 90 | 14.0 | 12.0 | Brazil | 171 | 26.0 | 197.0 | 2014 |
| 2018 | 64 | 32 | 3031768 | 47371.375000 | 91 | 78 | 12.0 | 14.0 | Russia | 169 | 26.0 | 195.0 | 2018 |
| 2022 | 64 | 32 | 3404252 | 53191.437500 | 101 | 71 | 15.0 | 11.0 | Qatar | 172 | 26.0 | 198.0 | 2022 |
ywd
| No_of_matches | No_of_teams | Attendance | Average_attendance | Total_home_goals | Total_away_goals | Total_home_penalty | Total_away_penalty | Host | normal_goals | Total_penalty | Total_goals | Years | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | |||||||||||||
| 1986 | 52 | 24 | 2394031 | 46039.057692 | 74 | 58 | 11.0 | 10.0 | Mexico | 132 | 21.0 | 153.0 | 1986 |
| 1990 | 52 | 24 | 2516215 | 48388.750000 | 67 | 48 | 14.0 | 14.0 | Italy | 115 | 28.0 | 143.0 | 1990 |
| 1994 | 52 | 24 | 3587538 | 68991.115385 | 83 | 58 | 8.0 | 10.0 | United States | 141 | 18.0 | 159.0 | 1994 |
| 1998 | 64 | 32 | 2903477 | 45366.828125 | 98 | 73 | 11.0 | 9.0 | France | 171 | 20.0 | 191.0 | 1998 |
| 2002 | 64 | 32 | 2705337 | 42270.890625 | 89 | 72 | 6.0 | 7.0 | Korea Republic, Japan | 161 | 13.0 | 174.0 | 2002 |
| 2006 | 64 | 32 | 3352605 | 52384.453125 | 86 | 61 | 10.0 | 11.0 | Germany | 147 | 21.0 | 168.0 | 2006 |
| 2010 | 64 | 32 | 3178856 | 49669.625000 | 76 | 69 | 9.0 | 5.0 | South Africa | 145 | 14.0 | 159.0 | 2010 |
| 2014 | 64 | 32 | 3429873 | 53591.765625 | 81 | 90 | 14.0 | 12.0 | Brazil | 171 | 26.0 | 197.0 | 2014 |
| 2018 | 64 | 32 | 3031768 | 47371.375000 | 91 | 78 | 12.0 | 14.0 | Russia | 169 | 26.0 | 195.0 | 2018 |
| 2022 | 64 | 32 | 3404252 | 53191.437500 | 101 | 71 | 15.0 | 11.0 | Qatar | 172 | 26.0 | 198.0 | 2022 |
#plotting the graph of attendance in world cup year wise in order to understand the trend
matplotlib.rcParams['font.size'] = 13
plt.rcParams['figure.figsize'] = (20,5)
plt.plot(ywd.index,ywd.Attendance,'.-b')
plt.xlabel('Years')
plt.ylabel('Attendance')
plt.title('World Cup Attendance')
Text(0.5, 1.0, 'World Cup Attendance')
It is clear over the year trend has been up and down but we can say that total attendance is overall increasing with years expect for 1994.
#plotting the graph of attendance per match or average attendance in world cup year wise in order to understand the trend
plt.plot(ywd.index,ywd.Average_attendance,'.-r')
plt.xlabel('Year')
plt.ylabel('Average Attendance')
plt.title('World Cup Average Attendance')
Text(0.5, 1.0, 'World Cup Average Attendance')
It is clear now after taking attendance per match in 1994 was highest. Also considering trend of attendance per match graph we can see there is slight increase in attendance per match over the years except for year 1994,1998 and 2002.
#plotting the graph of attendance per match or average attendance in world cup host wise in order to understand the trend
plt.plot(ywd.Host,ywd.Attendance,'o-c')
plt.title('Attendance')
plt.ylabel("Attendance (in millions)")
plt.xlabel('Host Nation')
Text(0.5, 0, 'Host Nation')
It is clear over the year trend has been up and down but we can say that total attendance is overall increasing with years expect when USA was host.
#plotting the graph of attendance per match or average attendance in world cup host wise in order to understand the trend
plt.plot(ywd.Host,ywd.Average_attendance,'s-m')
plt.title('Attendance per match')
plt.ylabel("Average Attendance")
plt.xlabel('Host Nation')
Text(0.5, 0, 'Host Nation')
It is clear now after seeing attendance per match was highest when USA was host. Also considering attendance per match graph we can see that there is slight increase in attendance per match over the years exception being when USA; France; Korea,Japan were the host.
#adding years(List) in dataframe in order to make manipulate data according to our need.
Years=years[::-1]
ywd['Years']=Years
ywd
| No_of_matches | No_of_teams | Attendance | Average_attendance | Total_home_goals | Total_away_goals | Total_home_penalty | Total_away_penalty | Host | normal_goals | Total_penalty | Total_goals | Years | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | |||||||||||||
| 1986 | 52 | 24 | 2394031 | 46039.057692 | 74 | 58 | 11.0 | 10.0 | Mexico | 132 | 21.0 | 153.0 | 1986 |
| 1990 | 52 | 24 | 2516215 | 48388.750000 | 67 | 48 | 14.0 | 14.0 | Italy | 115 | 28.0 | 143.0 | 1990 |
| 1994 | 52 | 24 | 3587538 | 68991.115385 | 83 | 58 | 8.0 | 10.0 | United States | 141 | 18.0 | 159.0 | 1994 |
| 1998 | 64 | 32 | 2903477 | 45366.828125 | 98 | 73 | 11.0 | 9.0 | France | 171 | 20.0 | 191.0 | 1998 |
| 2002 | 64 | 32 | 2705337 | 42270.890625 | 89 | 72 | 6.0 | 7.0 | Korea Republic, Japan | 161 | 13.0 | 174.0 | 2002 |
| 2006 | 64 | 32 | 3352605 | 52384.453125 | 86 | 61 | 10.0 | 11.0 | Germany | 147 | 21.0 | 168.0 | 2006 |
| 2010 | 64 | 32 | 3178856 | 49669.625000 | 76 | 69 | 9.0 | 5.0 | South Africa | 145 | 14.0 | 159.0 | 2010 |
| 2014 | 64 | 32 | 3429873 | 53591.765625 | 81 | 90 | 14.0 | 12.0 | Brazil | 171 | 26.0 | 197.0 | 2014 |
| 2018 | 64 | 32 | 3031768 | 47371.375000 | 91 | 78 | 12.0 | 14.0 | Russia | 169 | 26.0 | 195.0 | 2018 |
| 2022 | 64 | 32 | 3404252 | 53191.437500 | 101 | 71 | 15.0 | 11.0 | Qatar | 172 | 26.0 | 198.0 | 2022 |
#plotting the bar graph of home goals and away goals in world cup year wise in order to identify trends
#Using subplot and plotting two graphs side by side for comparison
fig,axes=plt.subplots(1,2)
sns.barplot(data=ywd,y='Years',x='Total_home_goals',orient='h',palette='flare',saturation=10,ax=axes[0])
axes[0].set_title('Home Goals by years')
axes[0].set_xlabel('Home Goals')
sns.barplot(data=ywd,y='Years',x='Total_away_goals',orient='h',palette='crest',saturation=10,ax=axes[1])
axes[1].set_title('Away Goals by years')
axes[1].set_xlabel('Away Goals')
Text(0.5, 0, 'Away Goals')
Maximum home goals scored was in 2022 when Qatar was host and max away goals scored was in 2014 when Brazil was host. As such there no trend with home goals and away goals
#plotting the line graph of home goals and away goals in same graph to compare home goals and away goals
matplotlib.rcParams['figure.figsize'] = (20, 5)
sns.lineplot(data=ywd,x='Years',y='Total_home_goals',marker='o',color='crimson')
sns.lineplot(data=ywd,x='Years',y='Total_away_goals',marker='s',color='darkblue')
plt.title('Goals scored')
plt.xlabel('Year')
plt.ylabel('No. of goals scored')
plt.legend(['Home goals','Away goals'])
<matplotlib.legend.Legend at 0x7f06b31d6a90>
It is clear from above graph that home goals are dominated by away goals except in 2014.
#plotting the line graph of home penalty goals and away penalty goals host wise in same graph to compare the same
sns.lineplot(data=ywd,x=ywd.Host,y='Total_home_goals',marker='s',color='crimson')
sns.lineplot(data=ywd,x='Host',y='Total_away_goals',marker='o',color='indigo')
plt.title('Goal Scored(1986-2022)')
plt.xlabel('Host Nation')
plt.ylabel('No. of Penalty')
plt.legend(['Home goals','Away goals'])
<matplotlib.legend.Legend at 0x7f06b31d6f70>
It is clear from above graph that home goals are dominated by away goals except when Brazil was host nation.
#plotting the line graph of home penalty goals and away penalty goals year wise in same graph to compare the same
matplotlib.rcParams['figure.figsize'] = (20, 5)
matplotlib.rcParams['font.size'] = 13
sns.lineplot(data=ywd,x='Years',y='Total_home_penalty',marker='s',color='slateblue')
sns.lineplot(data=ywd,x='Years',y='Total_away_penalty',marker='o',color='turquoise')
plt.title('Total Penalty')
plt.xlabel('Year')
plt.ylabel('No. of Penalty')
plt.legend(['Home penalty','Away penalty'])
<matplotlib.legend.Legend at 0x7f06b30cb400>
There is no domination of either home penalty or away penalty and even distribution is also uneven, No certain conclusion can be made regarding home and away penalties. Penalties can go either ways.
#plotting the line graph of home penalty goals and away penalty goals host wise in same graph to compare the same
sns.lineplot(data=ywd,x=ywd.Host,y='Total_home_penalty',marker='s',color='darkorange')
sns.lineplot(data=ywd,x='Host',y='Total_away_penalty',marker='o',color='indigo')
plt.title('Total Penalty (1986-2022)')
plt.xlabel('Host Nation')
plt.ylabel('No. of Penalty')
plt.legend(['Home penalty','Away penalty'])
<matplotlib.legend.Legend at 0x7f06b3005fd0>
There is no domination of either home penalty or away penalty and even distribution is also uneven, No certain conclusion can be made regarding home and away penalties. Penalties can go either ways.
matches_df
| Year | Host | Date | Round | home_team | away_team | home_penalty | away_penalty | home_score | away_score | Score | Notes | Attendance | Venue | home_xg | away_xg | home_manager | away_manager | home_captain | away_captain | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022 | Qatar | 12/18/2022 | Final | Argentina | France | 4.0 | 2.0 | 3 | 3 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | 88966 | Lusail Iconic Stadium, Lusail | 3.3 | 2.2 | Lionel Scaloni | Didier Deschamps | Lionel Messi | Hugo Lloris |
| 1 | 2022 | Qatar | 12/17/2022 | Third-place match | Croatia | Morocco | NaN | NaN | 2 | 1 | 2–1 | NaN | 44137 | Khalifa International Stadium, Doha | 0.7 | 1.2 | Zlatko Dalić | Hoalid Regragui | Luka Modrić | Hakim Ziyech |
| 2 | 2022 | Qatar | 12/14/2022 | Semi-finals | France | Morocco | NaN | NaN | 2 | 0 | 2–0 | NaN | 68294 | Al Bayt Stadium, Al Khor | 2.0 | 0.9 | Didier Deschamps | Hoalid Regragui | Hugo Lloris | Romain Saïss |
| 3 | 2022 | Qatar | 12/13/2022 | Semi-finals | Argentina | Croatia | NaN | NaN | 3 | 0 | 3–0 | NaN | 88966 | Lusail Iconic Stadium, Lusail | 2.3 | 0.5 | Lionel Scaloni | Zlatko Dalić | Lionel Messi | Luka Modrić |
| 4 | 2022 | Qatar | 12/10/2022 | Quarter-finals | Morocco | Portugal | NaN | NaN | 1 | 0 | 1–0 | NaN | 44198 | Al Thumama Stadium, ath-Thumāma | 1.4 | 0.9 | Hoalid Regragui | Fernando Santos | Romain Saïss | Pepe |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 599 | 1986 | Mexico | 6/2/1986 | Group stage | Argentina | Korea Republic | NaN | NaN | 3 | 1 | 3–1 | NaN | 60000 | Estadio Ol�mpico Universitario, Mexico City | NaN | NaN | Carlos Bilardo | Jung Nam Kim | Diego Maradona | Park Chang-sun |
| 600 | 1986 | Mexico | 6/2/1986 | Group stage | Morocco | Poland | NaN | NaN | 0 | 0 | 0–0 | NaN | 19900 | Estadio Universitario, Monterrey | NaN | NaN | Jose Faria | Antoni Piechniczek | Ezzaki Badou | Zbigniew Boniek |
| 601 | 1986 | Mexico | 6/1/1986 | Group stage | Spain | Brazil | NaN | NaN | 0 | 1 | 0–1 | NaN | 35748 | Jalisco, Guadalajara | NaN | NaN | Miguel Munoz | Tele Santana | José Antonio Camacho | Edinho |
| 602 | 1986 | Mexico | 6/1/1986 | Group stage | Canada | France | NaN | NaN | 0 | 1 | 0–1 | NaN | 65500 | Nou Camp - Estadio Le�n, Leon | NaN | NaN | Tony Waiters | Henri Michel | Bruce Wilson | Michel Platini |
| 603 | 1986 | Mexico | 5/31/1986 | Group stage | Bulgaria | Italy | NaN | NaN | 1 | 1 | 1–1 | NaN | 96000 | Estadio Azteca, Mexico City | NaN | NaN | Ivan Vutsov | Enzo Bearzot | Georgi Dimitrov | Gaetano Scirea |
604 rows × 20 columns
# Getting dataframe of when home team scored more goals
home_wins_df=matches_df[matches_df.home_score>matches_df.away_score][['Year','Host','Date','Venue','Round',
'home_team','home_score','away_score',
'Notes','home_manager','home_captain',]]
# Getting dataframe of when away team scored more goals
away_wins_df=matches_df[matches_df.home_score<matches_df.away_score][['Year','Host','Date','Venue','Round',
'away_team','home_score','away_score',
'Notes','away_manager','away_captain',]]
# Getting dataframe of Draws
draws_df=matches_df[matches_df.home_score==matches_df.away_score][['Year','Host','Date','Venue','Round',
'home_team','away_team','home_penalty','away_penalty',
'Notes','home_manager','home_captain',
'away_manager','away_captain',]]
#Reviewing data in home_win dataframe
home_wins_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 267 entries, 1 to 599 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 267 non-null int64 1 Host 267 non-null object 2 Date 267 non-null object 3 Venue 267 non-null object 4 Round 267 non-null object 5 home_team 267 non-null object 6 home_score 267 non-null int64 7 away_score 267 non-null int64 8 Notes 12 non-null object 9 home_manager 267 non-null object 10 home_captain 129 non-null object dtypes: int64(3), object(8) memory usage: 25.0+ KB
#getting data of teams with most wins when playing as home team
home_wins_df.home_team.value_counts().head(20)
Brazil 29 Germany 26 Argentina 26 France 17 Italy 15 Netherlands 14 Belgium 13 England 12 Portugal 11 Spain 9 Sweden 6 Korea Republic 6 Mexico 5 Uruguay 5 Colombia 4 Russia 4 Nigeria 4 Croatia 4 Cameroon 4 Switzerland 3 Name: home_team, dtype: int64
#Storing the data in new dataframe in order to visualize it
mhw=home_wins_df.home_team.value_counts().head(15)
mhw_df=pd.DataFrame()
mhw_df['Teams']=mhw.index
mhwlist=[]
for i in range(0,15):
mhwlist.append(mhw[i])
mhw_df['Score']=mhwlist
mhw_df
| Teams | Score | |
|---|---|---|
| 0 | Brazil | 29 |
| 1 | Germany | 26 |
| 2 | Argentina | 26 |
| 3 | France | 17 |
| 4 | Italy | 15 |
| 5 | Netherlands | 14 |
| 6 | Belgium | 13 |
| 7 | England | 12 |
| 8 | Portugal | 11 |
| 9 | Spain | 9 |
| 10 | Sweden | 6 |
| 11 | Korea Republic | 6 |
| 12 | Mexico | 5 |
| 13 | Uruguay | 5 |
| 14 | Colombia | 4 |
#Visualizing data the Top 15 Teams with most wins when palying as home team
matplotlib.rcParams['figure.figsize'] = (15, 5)
sns.barplot(data=mhw_df,x='Score',y='Teams',orient='h',palette='flare_r',saturation=0.8)
plt.title('Top 15 Teams with most wins on Home during campaign of 1986 to 2022')
Text(0.5, 1.0, 'Top 15 Teams with most wins on Home during campaign of 1986 to 2022')
Team with most wins when regarded as home team is Brazil as they have won 29 times followed Germany and Argentina each with 26 wins.
#Reviewing data in home_win dataframe
away_wins_df
| Year | Host | Date | Venue | Round | away_team | home_score | away_score | Notes | away_manager | away_captain | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 2022 | Qatar | 12/10/2022 | Al Bayt Stadium, Al Khor | Quarter-finals | France | 1 | 2 | NaN | Didier Deschamps | Hugo Lloris |
| 17 | 2022 | Qatar | 12/2/2022 | Al Janoub Stadium, Al Wakrah | Group stage | Uruguay | 0 | 2 | NaN | Diego Alonso | Luis Suárez |
| 19 | 2022 | Qatar | 12/2/2022 | Stadium 974, Doha | Group stage | Switzerland | 2 | 3 | NaN | Murat Yakin | Granit Xhaka |
| 20 | 2022 | Qatar | 12/1/2022 | Al Thumama Stadium, ath-Thumāma | Group stage | Morocco | 1 | 2 | NaN | Hoalid Regragui | Romain Saïss |
| 22 | 2022 | Qatar | 12/1/2022 | Al Bayt Stadium, Al Khor | Group stage | Germany | 2 | 4 | NaN | Hansi Flick | Manuel Neuer |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 584 | 1986 | Mexico | 6/7/1986 | Tres de Marzo, Guadalajara | Group stage | Spain | 1 | 2 | NaN | Miguel Munoz | José Antonio Camacho |
| 594 | 1986 | Mexico | 6/4/1986 | Neza, Nezahualcoyotl | Group stage | Denmark | 0 | 1 | NaN | Sepp Piontek | Morten Olsen |
| 595 | 1986 | Mexico | 6/3/1986 | Estadio Azteca, Mexico City | Group stage | Mexico | 1 | 2 | NaN | Bora Milutinovic | Tomás Boy |
| 601 | 1986 | Mexico | 6/1/1986 | Jalisco, Guadalajara | Group stage | Brazil | 0 | 1 | NaN | Tele Santana | Edinho |
| 602 | 1986 | Mexico | 6/1/1986 | Nou Camp - Estadio Le�n, Leon | Group stage | France | 0 | 1 | NaN | Henri Michel | Michel Platini |
193 rows × 11 columns
#Getting data of teams with most wins when playing as away team
away_wins_df.away_team.value_counts()
Spain 14 Germany 11 France 11 Brazil 10 Mexico 9 Croatia 9 Netherlands 8 England 7 Denmark 6 Italy 6 Switzerland 6 Uruguay 6 Belgium 5 Argentina 5 Senegal 5 Costa Rica 5 Colombia 5 Morocco 4 Ghana 4 Japan 4 Ecuador 3 Paraguay 3 Romania 3 United States 3 Türkiye 3 IR Iran 3 Chile 2 Republic of Ireland 2 Serbia 2 Nigeria 2 Sweden 2 Czechoslovakia 2 Saudi Arabia 2 Poland 2 FR Yugoslavia 1 Norway 1 Yugoslavia 1 Bulgaria 1 Soviet Union 1 Scotland 1 Jamaica 1 Slovenia 1 Korea Republic 1 Portugal 1 Czech Republic 1 Ukraine 1 South Africa 1 Côte d'Ivoire 1 Algeria 1 Australia 1 Peru 1 Tunisia 1 Cameroon 1 Name: away_team, dtype: int64
#Storing the data in new data frame in order to visualize it
maw=away_wins_df.away_team.value_counts().head(15)
maw_df=pd.DataFrame()
maw_df['Teams']=maw.index
mawlist=[]
for i in range(0,15):
mawlist.append(maw[i])
maw_df['Score']=mawlist
maw_df
| Teams | Score | |
|---|---|---|
| 0 | Spain | 14 |
| 1 | Germany | 11 |
| 2 | France | 11 |
| 3 | Brazil | 10 |
| 4 | Mexico | 9 |
| 5 | Croatia | 9 |
| 6 | Netherlands | 8 |
| 7 | England | 7 |
| 8 | Denmark | 6 |
| 9 | Italy | 6 |
| 10 | Switzerland | 6 |
| 11 | Uruguay | 6 |
| 12 | Belgium | 5 |
| 13 | Argentina | 5 |
| 14 | Senegal | 5 |
#Visualizing data the Top 15 Teams with most wins when playing as away team
sns.barplot(data=maw_df,x='Score',y='Teams',orient='h',palette='crest_r',saturation=1)
plt.title('Top 15 Teams with most wins on Away during campaign of 1986 to 2022')
Text(0.5, 1.0, 'Top 15 Teams with most wins on Away during campaign of 1986 to 2022')
Team with most wins when regarded as away team is Spain as they have won 14 times followed Germany and France each with 11 wins.
#plotting the graph of team with most win as home team and team with most win as away team side by side for comparison.
plt.rcParams['figure.figsize'] = (18,8)
fig,axes=plt.subplots(1,2)
sns.barplot(data=mhw_df,x='Score',y='Teams',orient='h',palette='flare_r',saturation=0.8,ax=axes[0])
axes[0].set_title('Top 15 Teams with most wins on Home during campaign of 1986 to 2022')
sns.barplot(data=maw_df,x='Score',y='Teams',orient='h',palette='crest_r',saturation=1,ax=axes[1])
axes[1].set_title('Top 15 Teams with most wins on Away during campaign of 1986 to 2022')
plt.show()
As we can see from the graph there are a lot more home wins than away wins. Brazil has 29 home wins and and 10 away wins, Total = 39 wins Germany has 26 home wins and 11 away wins, Total = 37 wins Argentina has 26 home wins and 5 away wins, Total = 31 wins France has 17 home wins and 11 away wins, Total = 28 wins Note: These wins are excluding penalty wins
#Reviewing data in draws dataframe
draws_df
| Year | Host | Date | Venue | Round | home_team | away_team | home_penalty | away_penalty | Notes | home_manager | home_captain | away_manager | away_captain | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022 | Qatar | 12/18/2022 | Lusail Iconic Stadium, Lusail | Final | Argentina | France | 4.0 | 2.0 | Argentina won on penalty kicks following extra... | Lionel Scaloni | Lionel Messi | Didier Deschamps | Hugo Lloris |
| 6 | 2022 | Qatar | 12/9/2022 | Education City Stadium, Doha | Quarter-finals | Croatia | Brazil | 4.0 | 2.0 | Croatia won on penalty kicks following extra time | Zlatko Dalić | Luka Modrić | Tite | Thiago Silva |
| 7 | 2022 | Qatar | 12/9/2022 | Lusail Iconic Stadium, Lusail | Quarter-finals | Netherlands | Argentina | 3.0 | 4.0 | Argentina won on penalty kicks following extra... | Louis van Gaal | Virgil van Dijk | Lionel Scaloni | Lionel Messi |
| 8 | 2022 | Qatar | 12/6/2022 | Education City Stadium, Doha | Round of 16 | Morocco | Spain | 3.0 | 0.0 | Morocco won on penalty kicks following extra time | Hoalid Regragui | Romain Saïss | Luis Enrique | Sergio Busquets |
| 10 | 2022 | Qatar | 12/5/2022 | Al Janoub Stadium, Al Wakrah | Round of 16 | Japan | Croatia | 1.0 | 3.0 | Croatia won on penalty kicks following extra time | Hajime Moriyasu | Maya Yoshida | Zlatko Dalić | Luka Modrić |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 591 | 1986 | Mexico | 6/5/1986 | Estadio Ol�mpico Universitario, Mexico City | Group stage | Korea Republic | Bulgaria | NaN | NaN | NaN | Jung Nam Kim | Park Chang-sun | Ivan Vutsov | Georgi Dimitrov |
| 593 | 1986 | Mexico | 6/4/1986 | Estadio Corregidora, Queretaro | Group stage | Uruguay | Germany | NaN | NaN | NaN | Omar Borras | Jorge Barrios | Franz Beckenbauer | Harald Schumacher |
| 596 | 1986 | Mexico | 6/3/1986 | Tres de Marzo, Guadalajara | Group stage | Algeria | Northern Ireland | NaN | NaN | NaN | Rabah Saadane | Mahmoud Guendouz | Billy Bingham | Sammy McIlroy |
| 600 | 1986 | Mexico | 6/2/1986 | Estadio Universitario, Monterrey | Group stage | Morocco | Poland | NaN | NaN | NaN | Jose Faria | Ezzaki Badou | Antoni Piechniczek | Zbigniew Boniek |
| 603 | 1986 | Mexico | 5/31/1986 | Estadio Azteca, Mexico City | Group stage | Bulgaria | Italy | NaN | NaN | NaN | Ivan Vutsov | Georgi Dimitrov | Enzo Bearzot | Gaetano Scirea |
144 rows × 14 columns
#Adding a new column in draws data frame as margin where margin is diff b/w home_penalty and away_penalty
draws_df["margin"]=draws_df.home_penalty-draws_df.away_penalty
draws_df
| Year | Host | Date | Venue | Round | home_team | away_team | home_penalty | away_penalty | Notes | home_manager | home_captain | away_manager | away_captain | margin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022 | Qatar | 12/18/2022 | Lusail Iconic Stadium, Lusail | Final | Argentina | France | 4.0 | 2.0 | Argentina won on penalty kicks following extra... | Lionel Scaloni | Lionel Messi | Didier Deschamps | Hugo Lloris | 2.0 |
| 6 | 2022 | Qatar | 12/9/2022 | Education City Stadium, Doha | Quarter-finals | Croatia | Brazil | 4.0 | 2.0 | Croatia won on penalty kicks following extra time | Zlatko Dalić | Luka Modrić | Tite | Thiago Silva | 2.0 |
| 7 | 2022 | Qatar | 12/9/2022 | Lusail Iconic Stadium, Lusail | Quarter-finals | Netherlands | Argentina | 3.0 | 4.0 | Argentina won on penalty kicks following extra... | Louis van Gaal | Virgil van Dijk | Lionel Scaloni | Lionel Messi | -1.0 |
| 8 | 2022 | Qatar | 12/6/2022 | Education City Stadium, Doha | Round of 16 | Morocco | Spain | 3.0 | 0.0 | Morocco won on penalty kicks following extra time | Hoalid Regragui | Romain Saïss | Luis Enrique | Sergio Busquets | 3.0 |
| 10 | 2022 | Qatar | 12/5/2022 | Al Janoub Stadium, Al Wakrah | Round of 16 | Japan | Croatia | 1.0 | 3.0 | Croatia won on penalty kicks following extra time | Hajime Moriyasu | Maya Yoshida | Zlatko Dalić | Luka Modrić | -2.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 591 | 1986 | Mexico | 6/5/1986 | Estadio Ol�mpico Universitario, Mexico City | Group stage | Korea Republic | Bulgaria | NaN | NaN | NaN | Jung Nam Kim | Park Chang-sun | Ivan Vutsov | Georgi Dimitrov | NaN |
| 593 | 1986 | Mexico | 6/4/1986 | Estadio Corregidora, Queretaro | Group stage | Uruguay | Germany | NaN | NaN | NaN | Omar Borras | Jorge Barrios | Franz Beckenbauer | Harald Schumacher | NaN |
| 596 | 1986 | Mexico | 6/3/1986 | Tres de Marzo, Guadalajara | Group stage | Algeria | Northern Ireland | NaN | NaN | NaN | Rabah Saadane | Mahmoud Guendouz | Billy Bingham | Sammy McIlroy | NaN |
| 600 | 1986 | Mexico | 6/2/1986 | Estadio Universitario, Monterrey | Group stage | Morocco | Poland | NaN | NaN | NaN | Jose Faria | Ezzaki Badou | Antoni Piechniczek | Zbigniew Boniek | NaN |
| 603 | 1986 | Mexico | 5/31/1986 | Estadio Azteca, Mexico City | Group stage | Bulgaria | Italy | NaN | NaN | NaN | Ivan Vutsov | Georgi Dimitrov | Enzo Bearzot | Gaetano Scirea | NaN |
144 rows × 15 columns
# Checking for Nan values in Margin column of draws dataframe
draws_df.margin.isna().sum()
110
# checking if above obsevation made is true
draws_df[draws_df.Round=='Group stage']
| Year | Host | Date | Venue | Round | home_team | away_team | home_penalty | away_penalty | Notes | home_manager | home_captain | away_manager | away_captain | margin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | 2022 | Qatar | 12/1/2022 | Ahmed bin Ali Stadium, Umm al-Afā'ī | Group stage | Croatia | Belgium | NaN | NaN | NaN | Zlatko Dalić | Luka Modrić | Roberto Martínez | Kevin De Bruyne | NaN |
| 32 | 2022 | Qatar | 11/28/2022 | Al Janoub Stadium, Al Wakrah | Group stage | Cameroon | Serbia | NaN | NaN | NaN | Rigobert Song | Eric Maxim Choupo-Moting | Dragan Stojković | Dušan Tadić | NaN |
| 39 | 2022 | Qatar | 11/27/2022 | Al Bayt Stadium, Al Khor | Group stage | Spain | Germany | NaN | NaN | NaN | Luis Enrique | Sergio Busquets | Hansi Flick | Manuel Neuer | NaN |
| 46 | 2022 | Qatar | 11/25/2022 | Khalifa International Stadium, Doha | Group stage | Netherlands | Ecuador | NaN | NaN | NaN | Louis van Gaal | Virgil van Dijk | Gustavo Alfaro | Enner Valencia | NaN |
| 47 | 2022 | Qatar | 11/25/2022 | Al Bayt Stadium, Al Khor | Group stage | England | United States | NaN | NaN | NaN | Gareth Southgate | Harry Kane | Gregg Berhalter | Tyler Adams | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 591 | 1986 | Mexico | 6/5/1986 | Estadio Ol�mpico Universitario, Mexico City | Group stage | Korea Republic | Bulgaria | NaN | NaN | NaN | Jung Nam Kim | Park Chang-sun | Ivan Vutsov | Georgi Dimitrov | NaN |
| 593 | 1986 | Mexico | 6/4/1986 | Estadio Corregidora, Queretaro | Group stage | Uruguay | Germany | NaN | NaN | NaN | Omar Borras | Jorge Barrios | Franz Beckenbauer | Harald Schumacher | NaN |
| 596 | 1986 | Mexico | 6/3/1986 | Tres de Marzo, Guadalajara | Group stage | Algeria | Northern Ireland | NaN | NaN | NaN | Rabah Saadane | Mahmoud Guendouz | Billy Bingham | Sammy McIlroy | NaN |
| 600 | 1986 | Mexico | 6/2/1986 | Estadio Universitario, Monterrey | Group stage | Morocco | Poland | NaN | NaN | NaN | Jose Faria | Ezzaki Badou | Antoni Piechniczek | Zbigniew Boniek | NaN |
| 603 | 1986 | Mexico | 5/31/1986 | Estadio Azteca, Mexico City | Group stage | Bulgaria | Italy | NaN | NaN | NaN | Ivan Vutsov | Georgi Dimitrov | Enzo Bearzot | Gaetano Scirea | NaN |
110 rows × 15 columns
draws_df[draws_df.Round=='Group stage'].info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 110 entries, 21 to 603 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 110 non-null int64 1 Host 110 non-null object 2 Date 110 non-null object 3 Venue 110 non-null object 4 Round 110 non-null object 5 home_team 110 non-null object 6 away_team 110 non-null object 7 home_penalty 0 non-null float64 8 away_penalty 0 non-null float64 9 Notes 0 non-null object 10 home_manager 110 non-null object 11 home_captain 46 non-null object 12 away_manager 110 non-null object 13 away_captain 46 non-null object 14 margin 0 non-null float64 dtypes: float64(3), int64(1), object(11) memory usage: 13.8+ KB
#Dataframe of home penalty wins
draws_df[draws_df.margin>0].info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 18 entries, 0 to 559 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 18 non-null int64 1 Host 18 non-null object 2 Date 18 non-null object 3 Venue 18 non-null object 4 Round 18 non-null object 5 home_team 18 non-null object 6 away_team 18 non-null object 7 home_penalty 18 non-null float64 8 away_penalty 18 non-null float64 9 Notes 18 non-null object 10 home_manager 18 non-null object 11 home_captain 8 non-null object 12 away_manager 18 non-null object 13 away_captain 8 non-null object 14 margin 18 non-null float64 dtypes: float64(3), int64(1), object(11) memory usage: 2.2+ KB
#Dataframe of away penalty wins
draws_df[draws_df.margin<0].info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 16 entries, 7 to 558 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 16 non-null int64 1 Host 16 non-null object 2 Date 16 non-null object 3 Venue 16 non-null object 4 Round 16 non-null object 5 home_team 16 non-null object 6 away_team 16 non-null object 7 home_penalty 16 non-null float64 8 away_penalty 16 non-null float64 9 Notes 16 non-null object 10 home_manager 16 non-null object 11 home_captain 11 non-null object 12 away_manager 16 non-null object 13 away_captain 11 non-null object 14 margin 16 non-null float64 dtypes: float64(3), int64(1), object(11) memory usage: 2.0+ KB
#Getting data of teams with most penalty wins when playing as home team
home_penalty_win=draws_df[draws_df.margin>0]
home_penalty_win.home_team.value_counts()
Brazil 3 Germany 3 Argentina 2 Croatia 2 Morocco 1 Netherlands 1 Costa Rica 1 Uruguay 1 Paraguay 1 Italy 1 Spain 1 Republic of Ireland 1 Name: home_team, dtype: int64
#Getting data of teams with most penalty wins when playing as away team
away_penalty_win=draws_df[draws_df.margin<0]
away_penalty_win.away_team.value_counts()
Argentina 4 Croatia 2 France 2 England 1 Russia 1 Portugal 1 Ukraine 1 Korea Republic 1 Sweden 1 Bulgaria 1 Belgium 1 Name: away_team, dtype: int64
#Storing the data in new data frame in order to visualize it
hpw=home_penalty_win.home_team.value_counts()
hpw_list=[]
hpw_df=pd.DataFrame()
hpw_df['Home_Team']=hpw.index
for i in range(0,len(hpw)):
hpw_list.append(hpw[i])
hpw_df['penalty_wins']=hpw_list
hpw_df
| Home_Team | penalty_wins | |
|---|---|---|
| 0 | Brazil | 3 |
| 1 | Germany | 3 |
| 2 | Argentina | 2 |
| 3 | Croatia | 2 |
| 4 | Morocco | 1 |
| 5 | Netherlands | 1 |
| 6 | Costa Rica | 1 |
| 7 | Uruguay | 1 |
| 8 | Paraguay | 1 |
| 9 | Italy | 1 |
| 10 | Spain | 1 |
| 11 | Republic of Ireland | 1 |
#Storing the data in new data frame in order to visualize it
apw=away_penalty_win.away_team.value_counts()
apw_list=[]
apw_df=pd.DataFrame()
apw_df['Away_Team']=apw.index
for i in range(0,len(apw)):
apw_list.append(apw[i])
apw_df['penalty_wins']=apw_list
apw_df
| Away_Team | penalty_wins | |
|---|---|---|
| 0 | Argentina | 4 |
| 1 | Croatia | 2 |
| 2 | France | 2 |
| 3 | England | 1 |
| 4 | Russia | 1 |
| 5 | Portugal | 1 |
| 6 | Ukraine | 1 |
| 7 | Korea Republic | 1 |
| 8 | Sweden | 1 |
| 9 | Bulgaria | 1 |
| 10 | Belgium | 1 |
#Visualizing data of Teams with most peanlty wins when playing as home team and away team using subplots.
fig,axes=plt.subplots(1,2)
plt.tight_layout(pad=5)
sns.barplot(data=hpw_df,y='Home_Team',x='penalty_wins',palette='flare_r',ax=axes[0])
axes[0].set_title('Team with most penalty win on Home')
axes[0].set_ylabel(None)
sns.barplot(data=apw_df,y='Away_Team',x='penalty_wins',palette='crest_r',ax=axes[1])
axes[1].set_title('Team with most penalty win on Away')
axes[1].set_ylabel(None)
Text(678.2295454545454, 0.5, '')
Argentina has record of highest penalty with 2 penalty wins at home and 4 (highest on away) penalty wins at away, total of 6 penalty wins.
Croatia also has good record in penalty wins with 2 wins at home and 2 at away, total of 4 penalty wins.
Brazil and Germany both have only won penalty as home team, having 3 penalty wins each.
#Creating a new dataset and copying the data of original dataframe into new dataframe
data_df_1=data_df
#Replacing the West Germany with Germany in new dataframe as in October 1990 reunification happened
data_df_1['home_team']=data_df_1['home_team'].replace(['West Germany'],'Germany')
data_df_1['away_team']=data_df_1['away_team'].replace(['West Germany'],'Germany')
#Creating new columns margin
data_df_1['margin']=data_df_1.home_score-data_df_1.away_score
#Based on margins columns creating new columns as Winner
#Winner columns contain list of winning teams in that particular match based on normal score(margin)
data_df_1.loc[data_df_1.margin>0,'Winner']=data_df_1.home_team
data_df_1.loc[data_df_1.margin<0,'Winner']=data_df_1.away_team
data_df_1.loc[data_df_1.margin==0,'Winner']='Draw'
#Creating new column of penalty margin
data_df_1['penalty_margin']=data_df_1.home_penalty-data_df_1.away_penalty
#Based on penalty margin columns creating new columns as Winner
#Winner columns contain list of winning teams in that particular match based on penalty score(penalty margin)
data_df_1.loc[data_df_1.penalty_margin>0,'Winner']=data_df_1.home_team
data_df_1.loc[data_df_1.penalty_margin<0,'Winner']=data_df_1.away_team
data_df_1.loc[data_df_1.penalty_margin==0,'Winner']='Draw'
#Reviewing data in new dataframe after the above manipualtion
data_df_1
| home_team | away_team | home_score | home_xg | home_penalty | away_score | away_xg | away_penalty | home_manager | home_captain | ... | Venue | Round | Date | Score | Notes | Host | Year | margin | Winner | penalty_margin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | France | 3 | 3.3 | 4.0 | 3 | 2.2 | 2.0 | Lionel Scaloni | Lionel Messi | ... | Lusail Iconic Stadium, Lusail | Final | 12/18/2022 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | Qatar | 2022 | 0 | Argentina | 2.0 |
| 1 | Croatia | Morocco | 2 | 0.7 | NaN | 1 | 1.2 | NaN | Zlatko Dalić | Luka Modrić | ... | Khalifa International Stadium, Doha | Third-place match | 12/17/2022 | 2–1 | NaN | Qatar | 2022 | 1 | Croatia | NaN |
| 2 | France | Morocco | 2 | 2.0 | NaN | 0 | 0.9 | NaN | Didier Deschamps | Hugo Lloris | ... | Al Bayt Stadium, Al Khor | Semi-finals | 12/14/2022 | 2–0 | NaN | Qatar | 2022 | 2 | France | NaN |
| 3 | Argentina | Croatia | 3 | 2.3 | NaN | 0 | 0.5 | NaN | Lionel Scaloni | Lionel Messi | ... | Lusail Iconic Stadium, Lusail | Semi-finals | 12/13/2022 | 3–0 | NaN | Qatar | 2022 | 3 | Argentina | NaN |
| 4 | Morocco | Portugal | 1 | 1.4 | NaN | 0 | 0.9 | NaN | Hoalid Regragui | Romain Saïss | ... | Al Thumama Stadium, ath-Thumāma | Quarter-finals | 12/10/2022 | 1–0 | NaN | Qatar | 2022 | 1 | Morocco | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 599 | Argentina | Korea Republic | 3 | NaN | NaN | 1 | NaN | NaN | Carlos Bilardo | Diego Maradona | ... | Estadio Ol�mpico Universitario, Mexico City | Group stage | 6/2/1986 | 3–1 | NaN | Mexico | 1986 | 2 | Argentina | NaN |
| 600 | Morocco | Poland | 0 | NaN | NaN | 0 | NaN | NaN | Jose Faria | Ezzaki Badou | ... | Estadio Universitario, Monterrey | Group stage | 6/2/1986 | 0–0 | NaN | Mexico | 1986 | 0 | Draw | NaN |
| 601 | Spain | Brazil | 0 | NaN | NaN | 1 | NaN | NaN | Miguel Munoz | José Antonio Camacho | ... | Jalisco, Guadalajara | Group stage | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | -1 | Brazil | NaN |
| 602 | Canada | France | 0 | NaN | NaN | 1 | NaN | NaN | Tony Waiters | Bruce Wilson | ... | Nou Camp - Estadio Le�n, Leon | Group stage | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | -1 | France | NaN |
| 603 | Bulgaria | Italy | 1 | NaN | NaN | 1 | NaN | NaN | Ivan Vutsov | Georgi Dimitrov | ... | Estadio Azteca, Mexico City | Group stage | 5/31/1986 | 1–1 | NaN | Mexico | 1986 | 0 | Draw | NaN |
604 rows × 23 columns
#Reviewing the new dataframe using info function
data_df_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 604 entries, 0 to 603 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 home_team 604 non-null object 1 away_team 604 non-null object 2 home_score 604 non-null int64 3 home_xg 128 non-null float64 4 home_penalty 34 non-null float64 5 away_score 604 non-null int64 6 away_xg 128 non-null float64 7 away_penalty 34 non-null float64 8 home_manager 604 non-null object 9 home_captain 284 non-null object 10 away_manager 604 non-null object 11 away_captain 284 non-null object 12 Attendance 604 non-null int64 13 Venue 604 non-null object 14 Round 604 non-null object 15 Date 604 non-null object 16 Score 604 non-null object 17 Notes 54 non-null object 18 Host 604 non-null object 19 Year 604 non-null int64 20 margin 604 non-null int64 21 Winner 604 non-null object 22 penalty_margin 34 non-null float64 dtypes: float64(5), int64(5), object(13) memory usage: 108.7+ KB
#Understanding the Winner column of the new dataframe
data_df_1.Winner.value_counts()
Draw 110 Brazil 42 Germany 40 Argentina 37 France 30 Spain 24 Netherlands 23 Italy 22 England 20 Belgium 19 Croatia 17 Mexico 14 Portugal 13 Uruguay 12 Denmark 9 Sweden 9 Colombia 9 Switzerland 9 Korea Republic 8 Costa Rica 7 Japan 7 Morocco 6 United States 6 Romania 6 Paraguay 6 Nigeria 6 Cameroon 5 Russia 5 Senegal 5 Ecuador 5 Poland 5 Ghana 5 Saudi Arabia 4 Türkiye 4 Australia 4 Bulgaria 4 Chile 4 Republic of Ireland 3 Côte d'Ivoire 3 Soviet Union 3 Yugoslavia 3 Czechoslovakia 3 IR Iran 3 Ukraine 3 Norway 2 FR Yugoslavia 2 Tunisia 2 South Africa 2 Greece 2 Serbia 2 Czech Republic 1 Slovenia 1 Jamaica 1 Slovakia 1 Algeria 1 Bosnia and Herzegovina 1 Austria 1 Peru 1 Scotland 1 Hungary 1 Name: Winner, dtype: int64
#Storing the data in new data frame in order to visualize it
most_wins=data_df_1[data_df_1.Winner!='Draw']
team_most_wins=pd.DataFrame()
team_most_wins['Team']=most_wins.Winner.value_counts().index
team_most_wins['No. of wins']=most_wins.Winner.value_counts().to_list()
tmw=team_most_wins.head(15)
tmw
| Team | No. of wins | |
|---|---|---|
| 0 | Brazil | 42 |
| 1 | Germany | 40 |
| 2 | Argentina | 37 |
| 3 | France | 30 |
| 4 | Spain | 24 |
| 5 | Netherlands | 23 |
| 6 | Italy | 22 |
| 7 | England | 20 |
| 8 | Belgium | 19 |
| 9 | Croatia | 17 |
| 10 | Mexico | 14 |
| 11 | Portugal | 13 |
| 12 | Uruguay | 12 |
| 13 | Denmark | 9 |
| 14 | Sweden | 9 |
#Visualizing data the Top 15 Teams with most wins.
matplotlib.rcParams['figure.figsize']=(15,5)
matplotlib.rcParams['font.size']=12
ax=sns.barplot(data=tmw,y='Team',x='No. of wins',orient='h',palette='flare_r',saturation=0.9)
ax.set(title='Team with most win during the World Cup campaign of 1986 to 2022 (Top 15)')
for p in ax.patches:
height=p.get_height()
width=p.get_width()
ax.text(x=width+0.1, y=p.get_y()+(height/2),s='{:.0f}'.format(width) , va='center')
plt.show()
Most dominating team during the World Cup from 1986 to 2022 was Brazil as they have won 42 games followed by Germany with 40 wins and Argentina being third with 37 wins.
#Using the group by on new dataframe and creating a another dataframe and storing the values in it.
#Here groupby func is used and groupby is based on rounds which is different from previously used group by function.
group_wise_data=pd.DataFrame()
group_wise_data['No. of matches']=data_df_1.groupby(by='Round').count()['Attendance']
group_wise_data['Total_Attendance']=data_df_1.groupby(by='Round').sum()['Attendance']
group_wise_data['Average_Attendance']=group_wise_data['Total_Attendance']/group_wise_data['No. of matches']
group_wise_data['goals_per_match']=(data_df_1.groupby(by='Round').sum()['home_score']/group_wise_data['No. of matches'])+(data_df_1.groupby(by='Round').sum()['away_score']/group_wise_data['No. of matches'])
group_wise_data['penalty_goals_per_match']=(data_df_1.groupby(by='Round').sum()['home_penalty']/group_wise_data['No. of matches'])+(data_df_1.groupby(by='Round').sum()['away_penalty']/group_wise_data['No. of matches'])
group_wise_data['Round']=data_df_1.groupby(by='Round').count()['Attendance'].index
gwd_df=group_wise_data
#Viewing newly created dataframe
gwd_df['Total_goals_per_match']=gwd_df.goals_per_match+gwd_df.penalty_goals_per_match
gwd_df
| No. of matches | Total_Attendance | Average_Attendance | goals_per_match | penalty_goals_per_match | Round | Total_goals_per_match | |
|---|---|---|---|---|---|---|---|
| Round | |||||||
| Final | 10 | 826631 | 82663.100000 | 2.700000 | 1.9000 | Final | 4.600000 |
| Group stage | 444 | 21329631 | 48039.709459 | 2.502252 | 0.0000 | Group stage | 2.502252 |
| Quarter-finals | 40 | 2319091 | 57977.275000 | 2.175000 | 2.3250 | Quarter-finals | 4.500000 |
| Round of 16 | 80 | 4102200 | 51277.500000 | 2.700000 | 0.9375 | Round of 16 | 3.637500 |
| Semi-finals | 20 | 1390159 | 69507.950000 | 2.250000 | 1.3000 | Semi-finals | 3.550000 |
| Third-place match | 10 | 536240 | 53624.000000 | 3.800000 | 0.0000 | Third-place match | 3.800000 |
# From the above dataframe creating line graph of attendance per match in different rounds/stages in World Cup
# in order to understand the trend of attendance round wise.
matplotlib.rcParams['figure.figsize']=(15,5)
sns.lineplot(data=gwd_df.sort_values(by='Average_Attendance'),x='Round',y='Average_Attendance',color='#cc0099',marker='o')
plt.show()
From above graph it is clear that attendance per round is highest in final followed by semi-finals and then quarter-finals which makes a lot of sense due to level of high of competition in those stages of World Cup.
#same data but instead of line graph barplot is used for simpler understanding.
matplotlib.rcParams['figure.figsize']=(10,5)
sns.barplot(data=gwd_df.sort_values(by='Average_Attendance',ascending=False),y='Round',x='Average_Attendance',
palette='crest_r',saturation=0.8)
plt.title('Average Attendance in World Cup during the campaign of 1986 to 2022 on basis of Round')
plt.xlabel('Average Attendance')
plt.show()
From above graph it is clear that attendance per round is highest in final followed by semi-finals and then quarter-finals which makes a lot of sense due to level of high of competition in those stages of World Cup.
#From the dataframe creating barplot of goals per match in different rounds/stages in World Cup
sns.barplot(data=gwd_df.sort_values(by='goals_per_match',ascending=False),y='Round',x='goals_per_match',
palette='flare_r',saturation=0.8)
plt.title('Goals per match in World Cup during the campaign of 1986 to 2022 on basis of Round')
plt.xlabel('Goals per match')
plt.show()
From the above graph goals per match is highest in Third-place match which is quite interesting and contrary to popular belief that most goals per match will be in Finals.
Also Round of 16 and group stage coming above the semi-finals and quarter-finals in term of goals scored is also quite intriguing observation.
May be the reason for less goals per matches in semis and quarters will be related to penalties goals being more
#From the dataframe creating barplot of penalty goals per match in different rounds/stages in World Cup
sns.barplot(data=gwd_df.sort_values(by='penalty_goals_per_match',ascending=False),y='Round',x='penalty_goals_per_match',
palette='Purples_r',saturation=0.8)
plt.title('Penalty Goals per match in World Cup during the campaign of 1986 to 2022 on basis of Round')
plt.xlabel('penalty goals per match')
plt.show()
In group stage there are no penalties that why in above we can see in above graph penalty per match in group stage but, no penalty goals in third place match is quite alluring. That is chances of third-place match going in penalties is very very less.
In Quarter-finals stage penalty per match is highest having value 2.33 which is quite high and contrary to popular belief.
#From the dataframe creating barplot of total goals per match in different rounds/stages in World Cup
sns.barplot(data=gwd_df.sort_values(by='Total_goals_per_match',ascending=False),y='Round',x='Total_goals_per_match',
palette='flare_r',saturation=0.8)
plt.title('Total Goals per match in World Cup during the campaign of 1986 to 2022 on basis of Round')
plt.xlabel('Total goals per match')
Text(0.5, 0, 'Total goals per match')
Now the picture is pretty clear when total goals per match graph is seen with most goals per match being Finals followed by quarter finals.
Finals_data_df=data_df_1[data_df_1['Round']=='Final']
fd_df=Finals_data_df
# From the above dataframe creating line graph of attendance in finals host wise
matplotlib.rcParams['figure.figsize']=(20,5)
sns.lineplot(data=fd_df.iloc[::-1],x='Host',y='Attendance',marker='s',color='#0000A5')
plt.title('World Cup Attendance in Finals')
plt.xlabel(None)
plt.ylabel(None)
plt.show()
Highest attendance in finals in accordance to graph is when host was Mexico even though we have previously seen that total attendance and attendance per match was highest when USA was host.
Lowest attendance in finals recorded when Korea,Japan (joint host) and Germany were the host
#From the dataframe creating line graph of attendance in finals year wise which will same graph but interpretation is diff
sns.lineplot(data=fd_df,x='Year',y='Attendance',marker='s',color='#0000A5')
plt.title('World Cup Attendance in Finals on basis of Year')
plt.ylabel(None)
plt.show()
Highest attendance in finals in accordance to graph is in 1986 even though we have previously seen that total attendance and attendance per match was highest in 1994.
Lowest attendance in finals recorded in 2002 and 2006.
data_df_1.loc[data_df_1.margin>0,'Winning_manager']=data_df_1.home_manager
data_df_1.loc[data_df_1.margin<0,'Winning_manager']=data_df_1.away_manager
data_df_1.loc[data_df_1.margin==0,'Winning_manager']='No Win'
data_df_1.loc[data_df_1.penalty_margin>0,'Winning_manager']=data_df_1.home_manager
data_df_1.loc[data_df_1.penalty_margin<0,'Winning_manager']=data_df_1.away_manager
data_df_1.loc[data_df_1.penalty_margin==0,'Winning_manager']='No Win'
data_df_1.loc[data_df_1.margin>0,'Winning_captain']=data_df_1.home_captain
data_df_1.loc[data_df_1.margin<0,'Winning_captain']=data_df_1.away_captain
data_df_1.loc[data_df_1.margin==0,'Winning_captain']='No Win'
data_df_1.loc[data_df_1.penalty_margin>0,'Winning_captain']=data_df_1.home_captain
data_df_1.loc[data_df_1.penalty_margin<0,'Winning_captain']=data_df_1.away_captain
data_df_1.loc[data_df_1.penalty_margin==0,'Winning_captain']='No Win'
data_df_1
| home_team | away_team | home_score | home_xg | home_penalty | away_score | away_xg | away_penalty | home_manager | home_captain | ... | Date | Score | Notes | Host | Year | margin | Winner | penalty_margin | Winning_manager | Winning_captain | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | France | 3 | 3.3 | 4.0 | 3 | 2.2 | 2.0 | Lionel Scaloni | Lionel Messi | ... | 12/18/2022 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | Qatar | 2022 | 0 | Argentina | 2.0 | Lionel Scaloni | Lionel Messi |
| 1 | Croatia | Morocco | 2 | 0.7 | NaN | 1 | 1.2 | NaN | Zlatko Dalić | Luka Modrić | ... | 12/17/2022 | 2–1 | NaN | Qatar | 2022 | 1 | Croatia | NaN | Zlatko Dalić | Luka Modrić |
| 2 | France | Morocco | 2 | 2.0 | NaN | 0 | 0.9 | NaN | Didier Deschamps | Hugo Lloris | ... | 12/14/2022 | 2–0 | NaN | Qatar | 2022 | 2 | France | NaN | Didier Deschamps | Hugo Lloris |
| 3 | Argentina | Croatia | 3 | 2.3 | NaN | 0 | 0.5 | NaN | Lionel Scaloni | Lionel Messi | ... | 12/13/2022 | 3–0 | NaN | Qatar | 2022 | 3 | Argentina | NaN | Lionel Scaloni | Lionel Messi |
| 4 | Morocco | Portugal | 1 | 1.4 | NaN | 0 | 0.9 | NaN | Hoalid Regragui | Romain Saïss | ... | 12/10/2022 | 1–0 | NaN | Qatar | 2022 | 1 | Morocco | NaN | Hoalid Regragui | Romain Saïss |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 599 | Argentina | Korea Republic | 3 | NaN | NaN | 1 | NaN | NaN | Carlos Bilardo | Diego Maradona | ... | 6/2/1986 | 3–1 | NaN | Mexico | 1986 | 2 | Argentina | NaN | Carlos Bilardo | Diego Maradona |
| 600 | Morocco | Poland | 0 | NaN | NaN | 0 | NaN | NaN | Jose Faria | Ezzaki Badou | ... | 6/2/1986 | 0–0 | NaN | Mexico | 1986 | 0 | Draw | NaN | No Win | No Win |
| 601 | Spain | Brazil | 0 | NaN | NaN | 1 | NaN | NaN | Miguel Munoz | José Antonio Camacho | ... | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | -1 | Brazil | NaN | Tele Santana | Edinho |
| 602 | Canada | France | 0 | NaN | NaN | 1 | NaN | NaN | Tony Waiters | Bruce Wilson | ... | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | -1 | France | NaN | Henri Michel | Michel Platini |
| 603 | Bulgaria | Italy | 1 | NaN | NaN | 1 | NaN | NaN | Ivan Vutsov | Georgi Dimitrov | ... | 5/31/1986 | 1–1 | NaN | Mexico | 1986 | 0 | Draw | NaN | No Win | No Win |
604 rows × 25 columns
#Reviewing the data using info func
data_df_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 604 entries, 0 to 603 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 home_team 604 non-null object 1 away_team 604 non-null object 2 home_score 604 non-null int64 3 home_xg 128 non-null float64 4 home_penalty 34 non-null float64 5 away_score 604 non-null int64 6 away_xg 128 non-null float64 7 away_penalty 34 non-null float64 8 home_manager 604 non-null object 9 home_captain 284 non-null object 10 away_manager 604 non-null object 11 away_captain 284 non-null object 12 Attendance 604 non-null int64 13 Venue 604 non-null object 14 Round 604 non-null object 15 Date 604 non-null object 16 Score 604 non-null object 17 Notes 54 non-null object 18 Host 604 non-null object 19 Year 604 non-null int64 20 margin 604 non-null int64 21 Winner 604 non-null object 22 penalty_margin 34 non-null float64 23 Winning_manager 604 non-null object 24 Winning_captain 348 non-null object dtypes: float64(5), int64(5), object(15) memory usage: 118.1+ KB
# Creating a new dataframe of only with manager having wins
manager_wins=data_df_1[data_df_1.Winning_manager!='No Win']
#Viewing the new dataframe
manager_wins
| home_team | away_team | home_score | home_xg | home_penalty | away_score | away_xg | away_penalty | home_manager | home_captain | ... | Date | Score | Notes | Host | Year | margin | Winner | penalty_margin | Winning_manager | Winning_captain | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | France | 3 | 3.3 | 4.0 | 3 | 2.2 | 2.0 | Lionel Scaloni | Lionel Messi | ... | 12/18/2022 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | Qatar | 2022 | 0 | Argentina | 2.0 | Lionel Scaloni | Lionel Messi |
| 1 | Croatia | Morocco | 2 | 0.7 | NaN | 1 | 1.2 | NaN | Zlatko Dalić | Luka Modrić | ... | 12/17/2022 | 2–1 | NaN | Qatar | 2022 | 1 | Croatia | NaN | Zlatko Dalić | Luka Modrić |
| 2 | France | Morocco | 2 | 2.0 | NaN | 0 | 0.9 | NaN | Didier Deschamps | Hugo Lloris | ... | 12/14/2022 | 2–0 | NaN | Qatar | 2022 | 2 | France | NaN | Didier Deschamps | Hugo Lloris |
| 3 | Argentina | Croatia | 3 | 2.3 | NaN | 0 | 0.5 | NaN | Lionel Scaloni | Lionel Messi | ... | 12/13/2022 | 3–0 | NaN | Qatar | 2022 | 3 | Argentina | NaN | Lionel Scaloni | Lionel Messi |
| 4 | Morocco | Portugal | 1 | 1.4 | NaN | 0 | 0.9 | NaN | Hoalid Regragui | Romain Saïss | ... | 12/10/2022 | 1–0 | NaN | Qatar | 2022 | 1 | Morocco | NaN | Hoalid Regragui | Romain Saïss |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 597 | Portugal | England | 1 | NaN | NaN | 0 | NaN | NaN | Jose Torres | Manuel Bento | ... | 6/3/1986 | 1–0 | NaN | Mexico | 1986 | 1 | Portugal | NaN | Jose Torres | Manuel Bento |
| 598 | Soviet Union | Hungary | 6 | NaN | NaN | 0 | NaN | NaN | Valeriy Lobanovskyi | Anatoliy Demyanenko | ... | 6/2/1986 | 6–0 | NaN | Mexico | 1986 | 6 | Soviet Union | NaN | Valeriy Lobanovskyi | Anatoliy Demyanenko |
| 599 | Argentina | Korea Republic | 3 | NaN | NaN | 1 | NaN | NaN | Carlos Bilardo | Diego Maradona | ... | 6/2/1986 | 3–1 | NaN | Mexico | 1986 | 2 | Argentina | NaN | Carlos Bilardo | Diego Maradona |
| 601 | Spain | Brazil | 0 | NaN | NaN | 1 | NaN | NaN | Miguel Munoz | José Antonio Camacho | ... | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | -1 | Brazil | NaN | Tele Santana | Edinho |
| 602 | Canada | France | 0 | NaN | NaN | 1 | NaN | NaN | Tony Waiters | Bruce Wilson | ... | 6/1/1986 | 0–1 | NaN | Mexico | 1986 | -1 | France | NaN | Henri Michel | Michel Platini |
494 rows × 25 columns
#Getting data of manager with most wins
manager_wins.Winning_manager.value_counts()
Luiz Felipe Scolari 16
Didier Deschamps 14
Joachim Löw 12
Carlos Alberto Parreira 11
Zlatko Dalić 10
..
Juan Antonio Pizzi 1
Gernot Rohr 1
Ricardo La Volpe 1
Karel Brückner 1
Jose Torres 1
Name: Winning_manager, Length: 170, dtype: int64
#Storing the data in new data frame in order to visualize it
most_win_manager=pd.DataFrame()
most_win_manager['Manager']=manager_wins.Winning_manager.value_counts().head(20).index
most_win_manager['Country_managed']=[
"Brazil",
"France",
"Germany",
["South Africa,Brazil"],
"Croatia",
"Germany",
"Uruguay",
"Argentina",
"Netherlands",
["Australia, Korea Republic, Netherlands"],
["United States,Germany"],
"England",
"Belgium",
["Côte d'Ivoire, Morocco, France"],
"France",
"Spain",
"Argentina",
"Colombia",
"Argentina",
"Italy",
]
most_win_manager['Wins']=manager_wins.Winning_manager.value_counts().head(20).to_list()
most_win_manager
| Manager | Country_managed | Wins | |
|---|---|---|---|
| 0 | Luiz Felipe Scolari | Brazil | 16 |
| 1 | Didier Deschamps | France | 14 |
| 2 | Joachim Löw | Germany | 12 |
| 3 | Carlos Alberto Parreira | [South Africa,Brazil] | 11 |
| 4 | Zlatko Dalić | Croatia | 10 |
| 5 | Franz Beckenbauer | Germany | 10 |
| 6 | Óscar Tabárez | Uruguay | 10 |
| 7 | Carlos Bilardo | Argentina | 10 |
| 8 | Louis van Gaal | Netherlands | 9 |
| 9 | Guus Hiddink | [Australia, Korea Republic, Netherlands] | 8 |
| 10 | Jürgen Klinsmann | [United States,Germany] | 7 |
| 11 | Gareth Southgate | England | 7 |
| 12 | Roberto Martínez | Belgium | 7 |
| 13 | Henri Michel | [Côte d'Ivoire, Morocco, France] | 7 |
| 14 | Aimé Jacquet | France | 7 |
| 15 | Vicente del Bosque | Spain | 6 |
| 16 | Lionel Scaloni | Argentina | 6 |
| 17 | José Pekerman | Colombia | 6 |
| 18 | Alejandro Sabella | Argentina | 6 |
| 19 | Marcello Lippi | Italy | 6 |
#Visualizing the data
matplotlib.rcParams['figure.figsize']=(20,8)
matplotlib.rcParams['font.size']=14
sns.barplot(data=most_win_manager.head(15),x='Wins',y='Manager',palette='mako',saturation=0.8)
plt.ylabel(None)
plt.title('Most Successful Managers during the World-Cup champaign of 1986 to 2022 (Top 15)')
plt.show()
Manager Luiz Felipe Scolari is the most successful manager having 16 wins managing Brazil followed by Didier Deschamps with 14 wins managing France and at third place is Joachim Löw with Germany having 12 wins.
Manager Carlos Alberto Parreira has managed two teams during his World Cup campaign and still managed to get 11 wins. The Two unique teams managed were South Africa,Brazil.
#reviewing data_df_1 dataframe for getting info of captains with most wins
data_df_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 604 entries, 0 to 603 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 home_team 604 non-null object 1 away_team 604 non-null object 2 home_score 604 non-null int64 3 home_xg 128 non-null float64 4 home_penalty 34 non-null float64 5 away_score 604 non-null int64 6 away_xg 128 non-null float64 7 away_penalty 34 non-null float64 8 home_manager 604 non-null object 9 home_captain 284 non-null object 10 away_manager 604 non-null object 11 away_captain 284 non-null object 12 Attendance 604 non-null int64 13 Venue 604 non-null object 14 Round 604 non-null object 15 Date 604 non-null object 16 Score 604 non-null object 17 Notes 54 non-null object 18 Host 604 non-null object 19 Year 604 non-null int64 20 margin 604 non-null int64 21 Winner 604 non-null object 22 penalty_margin 34 non-null float64 23 Winning_manager 604 non-null object 24 Winning_captain 348 non-null object dtypes: float64(5), int64(5), object(15) memory usage: 118.1+ KB
data_df_1.groupby('Year').count()[['Winning_captain','home_captain','away_captain']]
| Winning_captain | home_captain | away_captain | |
|---|---|---|---|
| Year | |||
| 1986 | 52 | 52 | 52 |
| 1990 | 52 | 52 | 52 |
| 1994 | 52 | 52 | 52 |
| 1998 | 16 | 0 | 0 |
| 2002 | 14 | 0 | 0 |
| 2006 | 11 | 0 | 0 |
| 2010 | 14 | 0 | 0 |
| 2014 | 9 | 0 | 0 |
| 2018 | 64 | 64 | 64 |
| 2022 | 64 | 64 | 64 |
data_df.groupby('Year').count()[['home_captain','away_captain']]
| home_captain | away_captain | |
|---|---|---|
| Year | ||
| 1986 | 52 | 52 |
| 1990 | 52 | 52 |
| 1994 | 52 | 52 |
| 1998 | 0 | 0 |
| 2002 | 0 | 0 |
| 2006 | 0 | 0 |
| 2010 | 0 | 0 |
| 2014 | 0 | 0 |
| 2018 | 64 | 64 |
| 2022 | 64 | 64 |
#Importing the newly created dataset
urlretrieve('https://raw.githubusercontent.com/meetth77/jovian_practice/main/Worldcup_matches_edited_1.csv','./cleaned_data.csv')
os.listdir('./')
['.git', '.jovianrc', 'zerotopandas-course-project-Copy5.ipynb', '.ipynb_checkpoints', 'new.csv', 'wc_cleaned.csv', 'cleaned_data.csv']
#Reading the data and reading the data into newly created new dataframe
cleaned_data_df=pd.read_csv('./cleaned_data.csv')
cleaned_data_df.pop('Unnamed: 0')
cleaned_data_df
| home_team | away_team | home_score | home_xg | home_penalty | away_score | away_xg | away_penalty | home_manager | away_manager | ... | Date | Score | Notes | Host | margin | penalty_margin | Year | Winner | Winning_manager | Winning_captain | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | France | 3 | 3.3 | 4.0 | 3 | 2.2 | 2.0 | Lionel Scaloni | Didier Deschamps | ... | 12/18/2022 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | Qatar | 0 | 2.0 | 2022 | Argentina | Lionel Scaloni | Lionel Messi |
| 1 | Croatia | Morocco | 2 | 0.7 | NaN | 1 | 1.2 | NaN | Zlatko Dalić | Hoalid Regragui | ... | 12/17/2022 | 2–1 | NaN | Qatar | 1 | NaN | 2022 | Croatia | Zlatko Dalić | Luka Modrić |
| 2 | France | Morocco | 2 | 2.0 | NaN | 0 | 0.9 | NaN | Didier Deschamps | Hoalid Regragui | ... | 12/14/2022 | 2–0 | NaN | Qatar | 2 | NaN | 2022 | France | Didier Deschamps | Hugo Lloris |
| 3 | Argentina | Croatia | 3 | 2.3 | NaN | 0 | 0.5 | NaN | Lionel Scaloni | Zlatko Dalić | ... | 12/13/2022 | 3–0 | NaN | Qatar | 3 | NaN | 2022 | Argentina | Lionel Scaloni | Lionel Messi |
| 4 | Morocco | Portugal | 1 | 1.4 | NaN | 0 | 0.9 | NaN | Hoalid Regragui | Fernando Santos | ... | 12/10/2022 | 1–0 | NaN | Qatar | 1 | NaN | 2022 | Morocco | Hoalid Regragui | Romain Saïss |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 599 | Argentina | Korea Republic | 3 | NaN | NaN | 1 | NaN | NaN | Carlos Bilardo | Jung Nam Kim | ... | 6/2/1986 | 3–1 | NaN | Mexico | 2 | NaN | 1986 | Argentina | Carlos Bilardo | Diego Maradona |
| 600 | Morocco | Poland | 0 | NaN | NaN | 0 | NaN | NaN | Jose Faria | Antoni Piechniczek | ... | 6/2/1986 | 0–0 | NaN | Mexico | 0 | NaN | 1986 | Draw | Draw | Draw |
| 601 | Spain | Brazil | 0 | NaN | NaN | 1 | NaN | NaN | Miguel Munoz | Tele Santana | ... | 6/1/1986 | 0–1 | NaN | Mexico | -1 | NaN | 1986 | Brazil | Tele Santana | Edinho |
| 602 | Canada | France | 0 | NaN | NaN | 1 | NaN | NaN | Tony Waiters | Henri Michel | ... | 6/1/1986 | 0–1 | NaN | Mexico | -1 | NaN | 1986 | France | Henri Michel | Michel Platini |
| 603 | Bulgaria | Italy | 1 | NaN | NaN | 1 | NaN | NaN | Ivan Vutsov | Enzo Bearzot | ... | 5/31/1986 | 1–1 | NaN | Mexico | 0 | NaN | 1986 | Draw | Draw | Draw |
604 rows × 25 columns
#reviewing the new dataframe
cleaned_data_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 604 entries, 0 to 603 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 home_team 604 non-null object 1 away_team 604 non-null object 2 home_score 604 non-null int64 3 home_xg 128 non-null float64 4 home_penalty 34 non-null float64 5 away_score 604 non-null int64 6 away_xg 128 non-null float64 7 away_penalty 34 non-null float64 8 home_manager 604 non-null object 9 away_manager 604 non-null object 10 home_captain 284 non-null object 11 away_captain 284 non-null object 12 Attendance 604 non-null int64 13 Venue 604 non-null object 14 Round 604 non-null object 15 Date 604 non-null object 16 Score 604 non-null object 17 Notes 54 non-null object 18 Host 604 non-null object 19 margin 604 non-null int64 20 penalty_margin 34 non-null float64 21 Year 604 non-null int64 22 Winner 604 non-null object 23 Winning_manager 604 non-null object 24 Winning_captain 604 non-null object dtypes: float64(5), int64(5), object(15) memory usage: 118.1+ KB
As we see from the above table that data Winning_captain there are 604 values.
#Storing the data in new data frame in order to visualize it
ccdd=cleaned_data_df[cleaned_data_df.Winning_captain!='Draw']
ccd_df=pd.DataFrame()
ccd_df['Captain']=ccdd.Winning_captain.value_counts().head(20).index
ccd_df['Team']=[
"France",
"Argentina",
"Argentina",
"Germany",
"Brazil",
"Croatia",
"Germany",
"Brazil",
"Brazil",
"France",
"England",
"Spain",
"Germany",
"Netherlands",
"Belgium",
"Netherlands",
"Uruguay",
"Italy",
"Italy",
"Croatia",
]
ccd_df['Wins']=ccdd.Winning_captain.value_counts().head(20).to_list()
ccd_df
| Captain | Team | Wins | |
|---|---|---|---|
| 0 | Hugo Lloris | France | 14 |
| 1 | Lionel Messi | Argentina | 13 |
| 2 | Diego Maradona | Argentina | 12 |
| 3 | Cafu | Germany | 11 |
| 4 | Philipp Lahm | Brazil | 11 |
| 5 | Luka Modrić | Croatia | 10 |
| 6 | Dunga | Germany | 9 |
| 7 | Lothar Matthäus | Brazil | 9 |
| 8 | Thiago Silva | Brazil | 9 |
| 9 | Harry Kane | France | 7 |
| 10 | Iker Casillas | England | 7 |
| 11 | Didier Deschamps | Spain | 7 |
| 12 | Robin van Persie | Germany | 6 |
| 13 | Diego Godín | Netherlands | 6 |
| 14 | Fabio Cannavaro | Belgium | 6 |
| 15 | Michael Ballack | Netherlands | 6 |
| 16 | Giovanni van Bronckhorst | Uruguay | 6 |
| 17 | Eden Hazard | Italy | 6 |
| 18 | Giuseppe Bergomi | Italy | 6 |
| 19 | Zvonimir Boban | Croatia | 5 |
#Visualizing the above data
sns.barplot(data=ccd_df.head(15),y="Captain",x="Wins",orient='h',palette='crest_r',saturation=0.8)
plt.ylabel('Captains')
plt.title('Most Successful Captains during the World-Cup champaign of 1986 to 2022 (Top 15)')
plt.show()
Player with most wins as Captain is Hugo Lloris, France picking up 14 wins under the leadership of Hugo Lloris.
Second player with most wins as Captain is Lionel Messi, Argentina picking up 13 wins under his leadership
Third player with most wins as Captain is Diego Maradona, Argentina picking up 12 wins under his leadership
#Creating a new dataset containing data of final round of every World Cup
worldcup_winner=cleaned_data_df[cleaned_data_df.Round=='Final']
worldcup_winner=worldcup_winner[['Date','Year','Venue','Host','home_team','away_team','Score','Winner',
'Notes','home_manager','away_manager','Winning_manager',
'home_captain','away_captain','Winning_captain','Attendance',
'home_score','away_score','home_penalty','away_penalty','home_xg','away_xg',]]
worldcup_winner
| Date | Year | Venue | Host | home_team | away_team | Score | Winner | Notes | home_manager | ... | home_captain | away_captain | Winning_captain | Attendance | home_score | away_score | home_penalty | away_penalty | home_xg | away_xg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12/18/2022 | 2022 | Lusail Iconic Stadium, Lusail | Qatar | Argentina | France | (4) 3–3 (2) | Argentina | Argentina won on penalty kicks following extra... | Lionel Scaloni | ... | Lionel Messi | Hugo Lloris | Lionel Messi | 88966 | 3 | 3 | 4.0 | 2.0 | 3.3 | 2.2 |
| 64 | 7/15/2018 | 2018 | Olimpiyskiy stadion Luzhniki, Moskva | Russia | France | Croatia | 4–2 | France | NaN | Didier Deschamps | ... | Hugo Lloris | Luka Modrić | Hugo Lloris | 78011 | 4 | 2 | NaN | NaN | 1.1 | 1.1 |
| 128 | 7/13/2014 | 2014 | Estadio Jornalista Mário Filho, Rio de Janeiro | Brazil | Germany | Argentina | 1–0 | Germany | Required Extra Time | Joachim Löw | ... | NaN | NaN | Philipp Lahm | 74738 | 1 | 0 | NaN | NaN | NaN | NaN |
| 192 | 7/11/2010 | 2010 | Soccer City, Johannesburg | South Africa | Netherlands | Spain | 0–1 | Spain | Required Extra Time | Bert van Marwijk | ... | NaN | NaN | Iker Casillas | 84490 | 0 | 1 | NaN | NaN | NaN | NaN |
| 256 | 7/9/2006 | 2006 | Olympiastadion Berlin, Berlin | Germany | Italy | France | (5) 1–1 (3) | Italy | Italy won on penalty kicks following extra time | Marcello Lippi | ... | NaN | NaN | Fabio Cannavaro | 69000 | 1 | 1 | 5.0 | 3.0 | NaN | NaN |
| 320 | 6/30/2002 | 2002 | Nissan Stadium, Yokohama | Korea Republic, Japan | Germany | Brazil | 0–2 | Brazil | NaN | Rudi Völler | ... | NaN | NaN | Cafu | 69029 | 0 | 2 | NaN | NaN | NaN | NaN |
| 384 | 7/12/1998 | 1998 | Stade de France, Paris | France | Brazil | France | 0–3 | France | NaN | Mário Zagallo | ... | NaN | NaN | Didier Deschamps | 80000 | 0 | 3 | NaN | NaN | NaN | NaN |
| 448 | 7/17/1994 | 1994 | Rose Bowl, Los Angeles | United States | Brazil | Italy | (3) 0–0 (2) | Brazil | Brazil won on penalty kicks following extra time | Carlos Alberto Parreira | ... | Dunga | Franco Baresi | Dunga | 94194 | 0 | 0 | 3.0 | 2.0 | NaN | NaN |
| 500 | 7/8/1990 | 1990 | Stadio Olimpico, Rome | Italy | West Germany | Argentina | 1–0 | West Germany | NaN | Franz Beckenbauer | ... | Lothar Matthäus | Diego Maradona | Lothar Matthäus | 73603 | 1 | 0 | NaN | NaN | NaN | NaN |
| 552 | 6/29/1986 | 1986 | Estadio Azteca, Mexico City | Mexico | Argentina | West Germany | 3–2 | Argentina | NaN | Carlos Bilardo | ... | Diego Maradona | Karl-Heinz Rummenigge | Diego Maradona | 114600 | 3 | 2 | NaN | NaN | NaN | NaN |
10 rows × 22 columns
#Analyzing data of Winner, finding out team with most World Cups
worldcup_winner.Winner.value_counts()
wcw_df=pd.DataFrame()
wcw_df['Team']=worldcup_winner.Winner.value_counts().index
wcw_df['Wins']=worldcup_winner.Winner.value_counts().to_list()
wcw_df
| Team | Wins | |
|---|---|---|
| 0 | Argentina | 2 |
| 1 | France | 2 |
| 2 | Brazil | 2 |
| 3 | Germany | 1 |
| 4 | Spain | 1 |
| 5 | Italy | 1 |
| 6 | West Germany | 1 |
#Visualizing the above data
sns.barplot(data=wcw_df,x='Team',y='Wins',palette='rocket')
plt.title('World Cup Winning Teams from 1986 to 2022')
plt.xlabel(None)
plt.show()
Teams with most World Cup are Argentina, France, and Brazil each team with 2 World Cup trophy in past 10 World Cups from 1986 to 2022.
#probability of finals going in overtime or in penalties
prob_otpen=(worldcup_winner.Notes.notna().sum()/worldcup_winner.Year.count())
print('Probability World Cup Finals going in overtime or penalties is {:0.2f}%.'.format(prob_otpen*100))
Probability World Cup Finals going in overtime or penalties is 50.00%.
prob_pen=(worldcup_winner.Notes.notna() & worldcup_winner.home_penalty.notna()).sum()/worldcup_winner.Year.count()
print('Probability World Cup Finals going in penalties is {:0.2f}%.'.format(prob_pen*100))
Probability World Cup Finals going in penalties is 30.00%.
prob_ot=(worldcup_winner.Notes.notna() & worldcup_winner.home_penalty.isna()).sum()/worldcup_winner.Year.count()
print('Probability World Cup Finals going in overtime/extratime is {:0.2f}%.'.format(prob_ot*100))
Probability World Cup Finals going in overtime/extratime is 20.00%.
#Creating a new dataframe as worldcup_winner_1 using previous dataframe
#Replacing Nan values with zero in order to perform addition of that column and creating new columns
worldcup_winner_1=worldcup_winner
worldcup_winner_1['home_score']=worldcup_winner['home_score'].fillna(0)
worldcup_winner_1['away_score']=worldcup_winner['away_score'].fillna(0)
worldcup_winner_1['home_penalty']=worldcup_winner['home_penalty'].fillna(0)
worldcup_winner_1['away_penalty']=worldcup_winner['away_penalty'].fillna(0)
worldcup_winner_1['Total_goals']=worldcup_winner.home_score+worldcup_winner.away_score+worldcup_winner.home_penalty+worldcup_winner.away_penalty
worldcup_winner_1
| Date | Year | Venue | Host | home_team | away_team | Score | Winner | Notes | home_manager | ... | away_captain | Winning_captain | Attendance | home_score | away_score | home_penalty | away_penalty | home_xg | away_xg | Total_goals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12/18/2022 | 2022 | Lusail Iconic Stadium, Lusail | Qatar | Argentina | France | (4) 3–3 (2) | Argentina | Argentina won on penalty kicks following extra... | Lionel Scaloni | ... | Hugo Lloris | Lionel Messi | 88966 | 3 | 3 | 4.0 | 2.0 | 3.3 | 2.2 | 12.0 |
| 64 | 7/15/2018 | 2018 | Olimpiyskiy stadion Luzhniki, Moskva | Russia | France | Croatia | 4–2 | France | NaN | Didier Deschamps | ... | Luka Modrić | Hugo Lloris | 78011 | 4 | 2 | 0.0 | 0.0 | 1.1 | 1.1 | 6.0 |
| 128 | 7/13/2014 | 2014 | Estadio Jornalista Mário Filho, Rio de Janeiro | Brazil | Germany | Argentina | 1–0 | Germany | Required Extra Time | Joachim Löw | ... | NaN | Philipp Lahm | 74738 | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 |
| 192 | 7/11/2010 | 2010 | Soccer City, Johannesburg | South Africa | Netherlands | Spain | 0–1 | Spain | Required Extra Time | Bert van Marwijk | ... | NaN | Iker Casillas | 84490 | 0 | 1 | 0.0 | 0.0 | NaN | NaN | 1.0 |
| 256 | 7/9/2006 | 2006 | Olympiastadion Berlin, Berlin | Germany | Italy | France | (5) 1–1 (3) | Italy | Italy won on penalty kicks following extra time | Marcello Lippi | ... | NaN | Fabio Cannavaro | 69000 | 1 | 1 | 5.0 | 3.0 | NaN | NaN | 10.0 |
| 320 | 6/30/2002 | 2002 | Nissan Stadium, Yokohama | Korea Republic, Japan | Germany | Brazil | 0–2 | Brazil | NaN | Rudi Völler | ... | NaN | Cafu | 69029 | 0 | 2 | 0.0 | 0.0 | NaN | NaN | 2.0 |
| 384 | 7/12/1998 | 1998 | Stade de France, Paris | France | Brazil | France | 0–3 | France | NaN | Mário Zagallo | ... | NaN | Didier Deschamps | 80000 | 0 | 3 | 0.0 | 0.0 | NaN | NaN | 3.0 |
| 448 | 7/17/1994 | 1994 | Rose Bowl, Los Angeles | United States | Brazil | Italy | (3) 0–0 (2) | Brazil | Brazil won on penalty kicks following extra time | Carlos Alberto Parreira | ... | Franco Baresi | Dunga | 94194 | 0 | 0 | 3.0 | 2.0 | NaN | NaN | 5.0 |
| 500 | 7/8/1990 | 1990 | Stadio Olimpico, Rome | Italy | West Germany | Argentina | 1–0 | West Germany | NaN | Franz Beckenbauer | ... | Diego Maradona | Lothar Matthäus | 73603 | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 |
| 552 | 6/29/1986 | 1986 | Estadio Azteca, Mexico City | Mexico | Argentina | West Germany | 3–2 | Argentina | NaN | Carlos Bilardo | ... | Karl-Heinz Rummenigge | Diego Maradona | 114600 | 3 | 2 | 0.0 | 0.0 | NaN | NaN | 5.0 |
10 rows × 23 columns
#Visualizing the data Total goals scored in finals of every World Cup year wise
matplotlib.rcParams['figure.figsize']=(20,5)
sns.lineplot(data=worldcup_winner_1[::-1], x='Host', y='Total_goals',marker='s',color='#B048B5')
plt.title('Total goals in finals game of every FIFA World Cup from 1986 to 2022 (Including Penalties)')
plt.xlabel(None)
plt.ylabel('Goals')
plt.show()
Sudden spike in goals when USA,Germany,and Qatar were host can be explained due to fact that matches went to penalties as can be seen in data frame given above.
Trend of goals in final game is kind of all over the place but we can say it is increasing if consider the sudden spike are due to penalties.
#Visualizing the data Total goals scored in finals of every World Cup Host wise
sns.lineplot(data=worldcup_winner_1[::-1], x='Year', y='Total_goals',marker='s',color='#B048B5')
plt.title('Total goals in finals game of every FIFA World Cup from 1986 to 2022 (Including Penalties)')
plt.ylabel('Goals')
plt.show()
Sudden spike in goals in year 1994,2006,2022 can be explained due to fact that matches went to penalties as can be seen in data frame given below.
Trend of goals in final game is kind of all over the place but we can say it is increasing if consider the sudden spike are due to penalties.
#worldcup_winner
worldcup_winner
| Date | Year | Venue | Host | home_team | away_team | Score | Winner | Notes | home_manager | ... | away_captain | Winning_captain | Attendance | home_score | away_score | home_penalty | away_penalty | home_xg | away_xg | Total_goals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12/18/2022 | 2022 | Lusail Iconic Stadium, Lusail | Qatar | Argentina | France | (4) 3–3 (2) | Argentina | Argentina won on penalty kicks following extra... | Lionel Scaloni | ... | Hugo Lloris | Lionel Messi | 88966 | 3 | 3 | 4.0 | 2.0 | 3.3 | 2.2 | 12.0 |
| 64 | 7/15/2018 | 2018 | Olimpiyskiy stadion Luzhniki, Moskva | Russia | France | Croatia | 4–2 | France | NaN | Didier Deschamps | ... | Luka Modrić | Hugo Lloris | 78011 | 4 | 2 | 0.0 | 0.0 | 1.1 | 1.1 | 6.0 |
| 128 | 7/13/2014 | 2014 | Estadio Jornalista Mário Filho, Rio de Janeiro | Brazil | Germany | Argentina | 1–0 | Germany | Required Extra Time | Joachim Löw | ... | NaN | Philipp Lahm | 74738 | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 |
| 192 | 7/11/2010 | 2010 | Soccer City, Johannesburg | South Africa | Netherlands | Spain | 0–1 | Spain | Required Extra Time | Bert van Marwijk | ... | NaN | Iker Casillas | 84490 | 0 | 1 | 0.0 | 0.0 | NaN | NaN | 1.0 |
| 256 | 7/9/2006 | 2006 | Olympiastadion Berlin, Berlin | Germany | Italy | France | (5) 1–1 (3) | Italy | Italy won on penalty kicks following extra time | Marcello Lippi | ... | NaN | Fabio Cannavaro | 69000 | 1 | 1 | 5.0 | 3.0 | NaN | NaN | 10.0 |
| 320 | 6/30/2002 | 2002 | Nissan Stadium, Yokohama | Korea Republic, Japan | Germany | Brazil | 0–2 | Brazil | NaN | Rudi Völler | ... | NaN | Cafu | 69029 | 0 | 2 | 0.0 | 0.0 | NaN | NaN | 2.0 |
| 384 | 7/12/1998 | 1998 | Stade de France, Paris | France | Brazil | France | 0–3 | France | NaN | Mário Zagallo | ... | NaN | Didier Deschamps | 80000 | 0 | 3 | 0.0 | 0.0 | NaN | NaN | 3.0 |
| 448 | 7/17/1994 | 1994 | Rose Bowl, Los Angeles | United States | Brazil | Italy | (3) 0–0 (2) | Brazil | Brazil won on penalty kicks following extra time | Carlos Alberto Parreira | ... | Franco Baresi | Dunga | 94194 | 0 | 0 | 3.0 | 2.0 | NaN | NaN | 5.0 |
| 500 | 7/8/1990 | 1990 | Stadio Olimpico, Rome | Italy | West Germany | Argentina | 1–0 | West Germany | NaN | Franz Beckenbauer | ... | Diego Maradona | Lothar Matthäus | 73603 | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 |
| 552 | 6/29/1986 | 1986 | Estadio Azteca, Mexico City | Mexico | Argentina | West Germany | 3–2 | Argentina | NaN | Carlos Bilardo | ... | Karl-Heinz Rummenigge | Diego Maradona | 114600 | 3 | 2 | 0.0 | 0.0 | NaN | NaN | 5.0 |
10 rows × 23 columns
worldcup_winner=worldcup_winner.replace(['West Germany'],'Germany')
urlretrieve('https://raw.githubusercontent.com/meetth77/jovian_practice/main/world_cup_winner.csv','./winner_data.csv')
winner_data_df_1=pd.read_csv('./winner_data.csv')
winner_data_df_1
| Year | Champion | Runner_Up | TopScorer | Goals | |
|---|---|---|---|---|---|
| 0 | 2022 | Argentina | France | Kylian Mbappé | 8 |
| 1 | 2018 | France | Croatia | Harry Kane | 6 |
| 2 | 2014 | Germany | Argentina | James Rodríguez | 6 |
| 3 | 2010 | Spain | Netherlands | Wesley Sneijder, Thomas Müller... | 5 |
| 4 | 2006 | Italy | France | Miroslav Klose | 5 |
| 5 | 2002 | Brazil | Germany | Ronaldo | 8 |
| 6 | 1998 | France | Brazil | Davor Šuker | 6 |
| 7 | 1994 | Brazil | Italy | Hristo Stoichkov, Oleg Salenko | 6 |
| 8 | 1990 | Germany | Argentina | Salvatore Schillaci | 6 |
| 9 | 1986 | Argentina | Germany | Gary Lineker | 6 |
# Merging the worldcup_winner and winner_data_df_1 to view the combined dataset which provides more insights
worldcup_winner=worldcup_winner.rename(columns={'Winner':'Champion'})
worldcup_winner_df=pd.merge(worldcup_winner,winner_data_df_1,on=['Year','Champion'])
worldcup_winner_df
| Date | Year | Venue | Host | home_team | away_team | Score | Champion | Notes | home_manager | ... | home_score | away_score | home_penalty | away_penalty | home_xg | away_xg | Total_goals | Runner_Up | TopScorer | Goals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12/18/2022 | 2022 | Lusail Iconic Stadium, Lusail | Qatar | Argentina | France | (4) 3–3 (2) | Argentina | Argentina won on penalty kicks following extra... | Lionel Scaloni | ... | 3 | 3 | 4.0 | 2.0 | 3.3 | 2.2 | 12.0 | France | Kylian Mbappé | 8 |
| 1 | 7/15/2018 | 2018 | Olimpiyskiy stadion Luzhniki, Moskva | Russia | France | Croatia | 4–2 | France | NaN | Didier Deschamps | ... | 4 | 2 | 0.0 | 0.0 | 1.1 | 1.1 | 6.0 | Croatia | Harry Kane | 6 |
| 2 | 7/13/2014 | 2014 | Estadio Jornalista Mário Filho, Rio de Janeiro | Brazil | Germany | Argentina | 1–0 | Germany | Required Extra Time | Joachim Löw | ... | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 | Argentina | James Rodríguez | 6 |
| 3 | 7/11/2010 | 2010 | Soccer City, Johannesburg | South Africa | Netherlands | Spain | 0–1 | Spain | Required Extra Time | Bert van Marwijk | ... | 0 | 1 | 0.0 | 0.0 | NaN | NaN | 1.0 | Netherlands | Wesley Sneijder, Thomas Müller... | 5 |
| 4 | 7/9/2006 | 2006 | Olympiastadion Berlin, Berlin | Germany | Italy | France | (5) 1–1 (3) | Italy | Italy won on penalty kicks following extra time | Marcello Lippi | ... | 1 | 1 | 5.0 | 3.0 | NaN | NaN | 10.0 | France | Miroslav Klose | 5 |
| 5 | 6/30/2002 | 2002 | Nissan Stadium, Yokohama | Korea Republic, Japan | Germany | Brazil | 0–2 | Brazil | NaN | Rudi Völler | ... | 0 | 2 | 0.0 | 0.0 | NaN | NaN | 2.0 | Germany | Ronaldo | 8 |
| 6 | 7/12/1998 | 1998 | Stade de France, Paris | France | Brazil | France | 0–3 | France | NaN | Mário Zagallo | ... | 0 | 3 | 0.0 | 0.0 | NaN | NaN | 3.0 | Brazil | Davor Šuker | 6 |
| 7 | 7/17/1994 | 1994 | Rose Bowl, Los Angeles | United States | Brazil | Italy | (3) 0–0 (2) | Brazil | Brazil won on penalty kicks following extra time | Carlos Alberto Parreira | ... | 0 | 0 | 3.0 | 2.0 | NaN | NaN | 5.0 | Italy | Hristo Stoichkov, Oleg Salenko | 6 |
| 8 | 7/8/1990 | 1990 | Stadio Olimpico, Rome | Italy | Germany | Argentina | 1–0 | Germany | NaN | Franz Beckenbauer | ... | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 | Argentina | Salvatore Schillaci | 6 |
| 9 | 6/29/1986 | 1986 | Estadio Azteca, Mexico City | Mexico | Argentina | Germany | 3–2 | Argentina | NaN | Carlos Bilardo | ... | 3 | 2 | 0.0 | 0.0 | NaN | NaN | 5.0 | Germany | Gary Lineker | 6 |
10 rows × 26 columns
#Creating a dataset containing team with most apperance in finals and visualizing it
noha=worldcup_winner_df.Champion.to_list()+worldcup_winner_df.Champion.to_list()
matplotlib.rcParams['figure.figsize'] = (8,4)
sns.histplot(y=noha,color='#0000CD')
plt.xlabel('No of apperances in final')
plt.ylabel('Teams')
Text(0, 0.5, 'Teams')
Teams with most appearance in finals are Argentina, France, Brazil. Each team has qualified for finals of World Cup for 4 times as can be seen in above graph.
worldcup_winner_df.rename(columns={'Goals':'top_scorer_goals'})
| Date | Year | Venue | Host | home_team | away_team | Score | Champion | Notes | home_manager | ... | home_score | away_score | home_penalty | away_penalty | home_xg | away_xg | Total_goals | Runner_Up | TopScorer | top_scorer_goals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12/18/2022 | 2022 | Lusail Iconic Stadium, Lusail | Qatar | Argentina | France | (4) 3–3 (2) | Argentina | Argentina won on penalty kicks following extra... | Lionel Scaloni | ... | 3 | 3 | 4.0 | 2.0 | 3.3 | 2.2 | 12.0 | France | Kylian Mbappé | 8 |
| 1 | 7/15/2018 | 2018 | Olimpiyskiy stadion Luzhniki, Moskva | Russia | France | Croatia | 4–2 | France | NaN | Didier Deschamps | ... | 4 | 2 | 0.0 | 0.0 | 1.1 | 1.1 | 6.0 | Croatia | Harry Kane | 6 |
| 2 | 7/13/2014 | 2014 | Estadio Jornalista Mário Filho, Rio de Janeiro | Brazil | Germany | Argentina | 1–0 | Germany | Required Extra Time | Joachim Löw | ... | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 | Argentina | James Rodríguez | 6 |
| 3 | 7/11/2010 | 2010 | Soccer City, Johannesburg | South Africa | Netherlands | Spain | 0–1 | Spain | Required Extra Time | Bert van Marwijk | ... | 0 | 1 | 0.0 | 0.0 | NaN | NaN | 1.0 | Netherlands | Wesley Sneijder, Thomas Müller... | 5 |
| 4 | 7/9/2006 | 2006 | Olympiastadion Berlin, Berlin | Germany | Italy | France | (5) 1–1 (3) | Italy | Italy won on penalty kicks following extra time | Marcello Lippi | ... | 1 | 1 | 5.0 | 3.0 | NaN | NaN | 10.0 | France | Miroslav Klose | 5 |
| 5 | 6/30/2002 | 2002 | Nissan Stadium, Yokohama | Korea Republic, Japan | Germany | Brazil | 0–2 | Brazil | NaN | Rudi Völler | ... | 0 | 2 | 0.0 | 0.0 | NaN | NaN | 2.0 | Germany | Ronaldo | 8 |
| 6 | 7/12/1998 | 1998 | Stade de France, Paris | France | Brazil | France | 0–3 | France | NaN | Mário Zagallo | ... | 0 | 3 | 0.0 | 0.0 | NaN | NaN | 3.0 | Brazil | Davor Šuker | 6 |
| 7 | 7/17/1994 | 1994 | Rose Bowl, Los Angeles | United States | Brazil | Italy | (3) 0–0 (2) | Brazil | Brazil won on penalty kicks following extra time | Carlos Alberto Parreira | ... | 0 | 0 | 3.0 | 2.0 | NaN | NaN | 5.0 | Italy | Hristo Stoichkov, Oleg Salenko | 6 |
| 8 | 7/8/1990 | 1990 | Stadio Olimpico, Rome | Italy | Germany | Argentina | 1–0 | Germany | NaN | Franz Beckenbauer | ... | 1 | 0 | 0.0 | 0.0 | NaN | NaN | 1.0 | Argentina | Salvatore Schillaci | 6 |
| 9 | 6/29/1986 | 1986 | Estadio Azteca, Mexico City | Mexico | Argentina | Germany | 3–2 | Argentina | NaN | Carlos Bilardo | ... | 3 | 2 | 0.0 | 0.0 | NaN | NaN | 5.0 | Germany | Gary Lineker | 6 |
10 rows × 26 columns
worldcup_winner_df[['Year','Host','Champion','Runner_Up','Winning_captain',
'Winning_manager','Attendance','Score','Notes','TopScorer']]
| Year | Host | Champion | Runner_Up | Winning_captain | Winning_manager | Attendance | Score | Notes | TopScorer | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022 | Qatar | Argentina | France | Lionel Messi | Lionel Scaloni | 88966 | (4) 3–3 (2) | Argentina won on penalty kicks following extra... | Kylian Mbappé |
| 1 | 2018 | Russia | France | Croatia | Hugo Lloris | Didier Deschamps | 78011 | 4–2 | NaN | Harry Kane |
| 2 | 2014 | Brazil | Germany | Argentina | Philipp Lahm | Joachim Löw | 74738 | 1–0 | Required Extra Time | James Rodríguez |
| 3 | 2010 | South Africa | Spain | Netherlands | Iker Casillas | Vicente del Bosque | 84490 | 0–1 | Required Extra Time | Wesley Sneijder, Thomas Müller... |
| 4 | 2006 | Germany | Italy | France | Fabio Cannavaro | Marcello Lippi | 69000 | (5) 1–1 (3) | Italy won on penalty kicks following extra time | Miroslav Klose |
| 5 | 2002 | Korea Republic, Japan | Brazil | Germany | Cafu | Luiz Felipe Scolari | 69029 | 0–2 | NaN | Ronaldo |
| 6 | 1998 | France | France | Brazil | Didier Deschamps | Aimé Jacquet | 80000 | 0–3 | NaN | Davor Šuker |
| 7 | 1994 | United States | Brazil | Italy | Dunga | Carlos Alberto Parreira | 94194 | (3) 0–0 (2) | Brazil won on penalty kicks following extra time | Hristo Stoichkov, Oleg Salenko |
| 8 | 1990 | Italy | Germany | Argentina | Lothar Matthäus | Franz Beckenbauer | 73603 | 1–0 | NaN | Salvatore Schillaci |
| 9 | 1986 | Mexico | Argentina | Germany | Diego Maradona | Carlos Bilardo | 114600 | 3–2 | NaN | Gary Lineker |
#Creating data of teams most apperance in World Cup
teams_played_df=pd.DataFrame()
teams_played_df['Team']=["Germany","Brazil","Argentina","France","England","Spain","Italy","Netherlands","Belgium",
"Mexico","Korea Republic","Croatia","United States","Uruguay","Portugal","Japan","Cameroon","Sweden","Denmark",
"Switzerland","Nigeria","Costa Rica","Paraguay","Morocco","Colombia","Saudi Arabia","Poland","Australia","Ghana",
"IR Iran","Tunisia","Russia","Bulgaria","Romania","Ecuador","Republic of Ireland","Senegal","Chile","Algeria","Greece",
"Serbia","South Africa","Scotland","Côte d'Ivoire","Türkiye","Norway","Soviet Union","Egypt","Austria","Canada",
"Slovenia","Honduras","Ukraine","Czechoslovakia","Yugoslavia","FR Yugoslavia","Slovakia","Togo","Wales","Jamaica",
"Bolivia","United Arab Emirates","New Zealand","Iraq","Bosnia and Herzegovina","Trinidad and Tobago","Northern Ireland",
"Qatar","China PR","Czech Republic","Panama","Iceland","Peru","Serbia and Montenegro","Korea DPR","Hungary","Angola",
]
teams_played_df['matches_played']=teams_df.value_counts().to_list()
teams_played_df
| Team | matches_played | |
|---|---|---|
| 0 | Germany | 58 |
| 1 | Brazil | 57 |
| 2 | Argentina | 54 |
| 3 | France | 46 |
| 4 | England | 45 |
| ... | ... | ... |
| 72 | Peru | 3 |
| 73 | Serbia and Montenegro | 3 |
| 74 | Korea DPR | 3 |
| 75 | Hungary | 3 |
| 76 | Angola | 3 |
77 rows × 2 columns

print('Number of teams/countries participated in world cup from the year 1986 to 2022 are: {}.'.format(len(teams)))
Number of teams/countries participated in world cup from the year 1986 to 2022 are: 77.
print('Total Number of matches played in world cup from the year 1986 to 2022 are: {}.'.format(matches_df.home_team.count()))
Total Number of matches played in world cup from the year 1986 to 2022 are: 604.
teams_sorted=teams
teams_sorted.sort()
print('List of Countries/team appearing in World Cup from 1986 to 2022 are:\n',teams_sorted)
List of Countries/team appearing in World Cup from 1986 to 2022 are: ['Algeria', 'Angola', 'Argentina', 'Australia', 'Austria', 'Belgium', 'Bolivia', 'Bosnia and Herzegovina', 'Brazil', 'Bulgaria', 'Cameroon', 'Canada', 'Chile', 'China PR', 'Colombia', 'Costa Rica', 'Croatia', 'Czech Republic', 'Czechoslovakia', "Côte d'Ivoire", 'Denmark', 'Ecuador', 'Egypt', 'England', 'FR Yugoslavia', 'France', 'Germany', 'Ghana', 'Greece', 'Honduras', 'Hungary', 'IR Iran', 'Iceland', 'Iraq', 'Italy', 'Jamaica', 'Japan', 'Korea DPR', 'Korea Republic', 'Mexico', 'Morocco', 'Netherlands', 'New Zealand', 'Nigeria', 'Northern Ireland', 'Norway', 'Panama', 'Paraguay', 'Peru', 'Poland', 'Portugal', 'Qatar', 'Republic of Ireland', 'Romania', 'Russia', 'Saudi Arabia', 'Scotland', 'Senegal', 'Serbia', 'Serbia and Montenegro', 'Slovakia', 'Slovenia', 'South Africa', 'Soviet Union', 'Spain', 'Sweden', 'Switzerland', 'Togo', 'Trinidad and Tobago', 'Tunisia', 'Türkiye', 'Ukraine', 'United Arab Emirates', 'United States', 'Uruguay', 'Wales', 'Yugoslavia']
teams_played_df.head(15)
| Team | matches_played | |
|---|---|---|
| 0 | Germany | 58 |
| 1 | Brazil | 57 |
| 2 | Argentina | 54 |
| 3 | France | 46 |
| 4 | England | 45 |
| 5 | Spain | 44 |
| 6 | Italy | 40 |
| 7 | Netherlands | 39 |
| 8 | Belgium | 37 |
| 9 | Mexico | 36 |
| 10 | Korea Republic | 36 |
| 11 | Croatia | 30 |
| 12 | United States | 30 |
| 13 | Uruguay | 30 |
| 14 | Portugal | 29 |
matplotlib.rcParams['font.size']=14
matplotlib.rcParams['figure.figsize']=(15,5)
sns.barplot(data=teams_played_df.head(15),y='Team',x='matches_played',palette='crest_r',saturation=0.8)
plt.title('Teams with most participation in World Cup from 1986 to 2022 (Top 15)')
plt.ylabel('Teams/Countries')
plt.show()
from plotly.offline import iplot
import plotly.graph_objs as go
data=dict(type='choropleth',
locations=teams_played_df.Team,
locationmode='country names',
colorscale='teal',
z=teams_played_df.matches_played,
colorbar={'title':'No of Matches played'}
)
layout=dict(geo={'scope':'world'})
chmap=go.Figure(data=[data],layout=layout)
iplot(chmap)
It is clear from the Map that World Cup is not only dominated by European teams but also South American teams as well.
data=dict(type='choropleth',
locations=teams_played_df.Team,
locationmode='country names',
colorscale='teal',
z=teams_played_df.matches_played,
colorbar={'title':'No of Matches played'}
)
layout=dict(geo={'scope':'asia'})
chmap=go.Figure(data=[data],layout=layout)
iplot(chmap)
From Asia continent as well there is a lot participation. Here Dominant Teams are South Korea(Korea Republic), Japan ,and Saudi Arabia
data=dict(type='choropleth',
locations=teams_played_df.Team,
locationmode='country names',
colorscale='teal',
z=teams_played_df.matches_played,
colorbar={'title':'No of Matches played'}
)
layout=dict(geo={'scope':'africa'})
chmap=go.Figure(data=[data],layout=layout)
iplot(chmap)
Here Dominating countries/teams were Cameroon, Nigeria, Morocco

matches_df[matches_df.Round=='Final'][['Host','Year']][::-1]
| Host | Year | |
|---|---|---|
| 552 | Mexico | 1986 |
| 500 | Italy | 1990 |
| 448 | United States | 1994 |
| 384 | France | 1998 |
| 320 | Korea Republic, Japan | 2002 |
| 256 | Germany | 2006 |
| 192 | South Africa | 2010 |
| 128 | Brazil | 2014 |
| 64 | Russia | 2018 |
| 0 | Qatar | 2022 |
matplotlib.rcParams['figure.figsize']=(20,5)
matplotlib.rcParams['font.size']=14
sns.lineplot(x=ywd.Host ,y=ywd.Attendance,color='#5900b3')
plt.title('Total Attendance Host Wise')
plt.ylabel("Attendance (in millions)")
plt.xlabel('Host Nation')
plt.show()
It is clear over the year trend has been up and down but we can say that total attendance is overall increasing with years expect for in 1994 when USA was host.
sns.lineplot(x=ywd.index,y=ywd.Attendance,color='#5900b3',marker='s')
plt.title('Total Attendance Year Wise')
plt.ylabel("Attendance (in millions)")
plt.xlabel('Year')
plt.show()
Despite having less No of matches played is 1994 when USA was host it recorded highest attendance when compared to attendance form 1986 to 2022.
ywd[['Host','No_of_matches','No_of_teams','Attendance','Average_attendance']]
| Host | No_of_matches | No_of_teams | Attendance | Average_attendance | |
|---|---|---|---|---|---|
| Year | |||||
| 1986 | Mexico | 52 | 24 | 2394031 | 46039.057692 |
| 1990 | Italy | 52 | 24 | 2516215 | 48388.750000 |
| 1994 | United States | 52 | 24 | 3587538 | 68991.115385 |
| 1998 | France | 64 | 32 | 2903477 | 45366.828125 |
| 2002 | Korea Republic, Japan | 64 | 32 | 2705337 | 42270.890625 |
| 2006 | Germany | 64 | 32 | 3352605 | 52384.453125 |
| 2010 | South Africa | 64 | 32 | 3178856 | 49669.625000 |
| 2014 | Brazil | 64 | 32 | 3429873 | 53591.765625 |
| 2018 | Russia | 64 | 32 | 3031768 | 47371.375000 |
| 2022 | Qatar | 64 | 32 | 3404252 | 53191.437500 |
plt.plot(ywd.Host,ywd.Average_attendance,color='#5900b3')
plt.title('Attendance per match')
plt.ylabel("Attendance")
plt.xlabel('Host Nation')
plt.show()
plt.plot(ywd.index,ywd.Average_attendance,color='#5900b3',marker='o')
plt.title('Attendance per match')
plt.ylabel("Attendance")
plt.xlabel('Host Nation')
plt.show()
It is clear now after taking attendance per match in 1994 attendance was highest when USA was host. Also considering attendance per match graph we can see there is slight increase in attendance per match over the years except in 1994 when USA was host.
gwd_df[['No. of matches','Total_Attendance','Average_Attendance']]
| No. of matches | Total_Attendance | Average_Attendance | |
|---|---|---|---|
| Round | |||
| Final | 10 | 826631 | 82663.100000 |
| Group stage | 444 | 21329631 | 48039.709459 |
| Quarter-finals | 40 | 2319091 | 57977.275000 |
| Round of 16 | 80 | 4102200 | 51277.500000 |
| Semi-finals | 20 | 1390159 | 69507.950000 |
| Third-place match | 10 | 536240 | 53624.000000 |
sns.lineplot(data=gwd_df.sort_values(by='Average_Attendance'),x='Round',y='Average_Attendance',color='#5900b3')
plt.title('Trend of attendance Round/Stage Wise')
plt.ylabel('Attendance')
plt.show()
matplotlib.rcParams['figure.figsize']=(15,4)
sns.barplot(data=gwd_df.sort_values(by='Average_Attendance',ascending=False),y='Round',x='Average_Attendance',
palette='mako',saturation=0.8)
plt.title('Average Attendance in World Cup (1986 to 2022) on basis of Round')
plt.xlabel('Attendance')
plt.ylabel(None)
plt.show()
From above two graphs it is clear that attendance per round is highest in final followed by semi-finals and then quarter-finals which makes a lot of sense but attendance in third-place match is slightly less when considering the importance of match.

ywd[['Host','Total_home_goals','Total_away_goals','Total_home_penalty','Total_away_penalty','normal_goals',
'Total_penalty','Total_goals']]
| Host | Total_home_goals | Total_away_goals | Total_home_penalty | Total_away_penalty | normal_goals | Total_penalty | Total_goals | |
|---|---|---|---|---|---|---|---|---|
| Year | ||||||||
| 1986 | Mexico | 74 | 58 | 11.0 | 10.0 | 132 | 21.0 | 153.0 |
| 1990 | Italy | 67 | 48 | 14.0 | 14.0 | 115 | 28.0 | 143.0 |
| 1994 | United States | 83 | 58 | 8.0 | 10.0 | 141 | 18.0 | 159.0 |
| 1998 | France | 98 | 73 | 11.0 | 9.0 | 171 | 20.0 | 191.0 |
| 2002 | Korea Republic, Japan | 89 | 72 | 6.0 | 7.0 | 161 | 13.0 | 174.0 |
| 2006 | Germany | 86 | 61 | 10.0 | 11.0 | 147 | 21.0 | 168.0 |
| 2010 | South Africa | 76 | 69 | 9.0 | 5.0 | 145 | 14.0 | 159.0 |
| 2014 | Brazil | 81 | 90 | 14.0 | 12.0 | 171 | 26.0 | 197.0 |
| 2018 | Russia | 91 | 78 | 12.0 | 14.0 | 169 | 26.0 | 195.0 |
| 2022 | Qatar | 101 | 71 | 15.0 | 11.0 | 172 | 26.0 | 198.0 |
matplotlib.rcParams['figure.figsize'] = (20, 5)
sns.lineplot(data=ywd,x=ywd.index,y='Total_home_goals',marker='o',color='#5900b3')
sns.lineplot(data=ywd,x=ywd.index,y='Total_away_goals',marker='s',color='turquoise')
plt.title('Trend of goals scored by Year wise')
plt.xlabel('Year')
plt.ylabel('No. of goals scored')
plt.legend(['Home goals','Away goals'])
plt.show()
plt.plot(ywd.Host,ywd.Total_home_goals,color='#5900b3')
plt.plot(ywd.Host,ywd.Total_away_goals,color='turquoise')
plt.title('Trend of goals scored Host wise')
plt.xlabel('Host Nation')
plt.ylabel('No. of goals scored')
plt.legend(['Home goals','Away goals'])
plt.show()
It is clear from above two graphs that home goals are dominated by away goals meaning chance of team winning as home team is slightly high as compared when team is playing as away team.
Only in 2014 when host nation was Brazil the away team goals dominated home team goals which is quite intriguing.
fig,axes=plt.subplots(1,2)
sns.barplot(data=ywd,y=ywd.index,x='Total_home_goals',orient='h',palette='mako',saturation=10,ax=axes[0])
axes[0].set_title('Home Goals by years')
axes[0].set_xlabel('Home Goals')
sns.barplot(data=ywd,y=ywd.index,x='Total_away_goals',orient='h',palette='rocket',saturation=10,ax=axes[1])
axes[1].set_title('Away Goals by years')
axes[1].set_xlabel('Away Goals')
plt.show()
As such there no trend with home goals but we can see that from 1990 number of home goals have increased and peaked in 1998 and then start decreasing till 2010 and then again started increasing. Same can be said about away goals as well.
Total home goals scored was maximum in 2022 when host was Qatar whereas total away goals scored was maximum in 2014 when host was Brazil.
sns.lineplot(data=ywd,x=ywd.index,y='Total_home_penalty',marker='s',color='#5900b3')
sns.lineplot(data=ywd,x=ywd.index,y='Total_away_penalty',marker='o',color='turquoise')
plt.title('Trend of Penalty scored year wise')
plt.xlabel('Year')
plt.ylabel('No. of Penalty')
plt.legend(['Home penalty','Away penalty'])
plt.show()
sns.lineplot(data=ywd,x='Host',y='Total_home_penalty',color='#5900b3')
sns.lineplot(data=ywd,x='Host',y='Total_away_penalty',color='turquoise')
plt.title('Trend of Penalty scored Host wise[1986 to 2022]')
plt.xlabel('Year')
plt.ylabel('No. of Penalty')
plt.legend(['Home penalty','Away penalty'])
plt.show()
No trend can identified considering graph of home penalty and away penalty. Distribution is all over the place and no exact prediction can made on winning of home team or away team. Penalties here are also like 50-50 as in real life. So there is clearly no advantage to one team when it comes to penalties.
sns.lineplot(data=ywd,x='Host',y='normal_goals',color='#5900b3')
plt.title('Trend of Normal goals scored Host wise[1986 to 2022]')
plt.xlabel('Host Nation')
plt.ylabel('No. of goals')
plt.show()
sns.lineplot(data=ywd,x='Host',y='Total_penalty',color='#5900b3')
plt.title('Trend of penalty goals scored Host wise[1986 to 2022]')
plt.xlabel('Year')
plt.ylabel('No. of penalty')
plt.show()
From above two graphs we can see that in 1990 when Italy normals goals where less as compared to other years and penalty goals where more as compared to other years World Cup. Therefore we can say that chance of matches going to penalties is high as compared to other WOrld Cups.
sns.lineplot(data=ywd,x='Host',y='Total_goals',color='#5900b3')
plt.title('Trend of Total Goals scored Host wise[1986 to 2022]')
plt.xlabel('Year')
plt.ylabel('No. of Goals')
plt.show()
Trend here suggest that over the year total no of goals scored are increasing but as we move ahead year wise that from year 2014 when Brazil there is only slight increase in total no of goals scored when compared to 2022 when host Qatar.
Note:Total goals scored here is Normal goals + Penalty goals
gwd_df[['No. of matches','goals_per_match','penalty_goals_per_match','Total_goals_per_match']]
| No. of matches | goals_per_match | penalty_goals_per_match | Total_goals_per_match | |
|---|---|---|---|---|
| Round | ||||
| Final | 10 | 2.700000 | 1.9000 | 4.600000 |
| Group stage | 444 | 2.502252 | 0.0000 | 2.502252 |
| Quarter-finals | 40 | 2.175000 | 2.3250 | 4.500000 |
| Round of 16 | 80 | 2.700000 | 0.9375 | 3.637500 |
| Semi-finals | 20 | 2.250000 | 1.3000 | 3.550000 |
| Third-place match | 10 | 3.800000 | 0.0000 | 3.800000 |
sns.barplot(data=gwd_df.sort_values('goals_per_match',ascending=False),y='Round',x='goals_per_match',orient='h',
palette='mako')
plt.title('Trend of Goals per match based on Round/Stage')
plt.xlabel('No of goals')
plt.show()
Here from graph we can see that most goals per match round wise is highest in third-place match which is quite interesting because we expect that most goals in a match would be higher in either in finals as there is more competition in finals or in group stage as teams aren't concerned about scoreline in this stage.
sns.barplot(data=gwd_df.sort_values('penalty_goals_per_match',ascending=False),y='Round',x='penalty_goals_per_match',
orient='h',palette='mako')
plt.title('Trend of Penalty scored per match based on Round/Stage')
plt.xlabel('No of penalty scored')
plt.show()
Above graph give us very different perspective as you would expect highest penalty per match in final or semi-finals due to high level competition at that stage but most penalty scored per match is highest in Quarter-finals.
sns.barplot(data=gwd_df.sort_values('Total_goals_per_match',ascending=False),y='Round',x='Total_goals_per_match',
orient='h',palette='mako')
plt.title('Trend of Total goals scored per match based on Round/Stage')
plt.xlabel('No of goals scored')
plt.show()
According to graph highest total goals per match is in finals which make sense because of high level of competition at that stage but second highest turns out to be in quarter-finals which quite contradictory to our thought process because after the finals, the semi-finals is most competitive stage.

worldcup_winner_df[['Year','Champion','Runner_Up','Score','Attendance','Notes']][::-1]
| Year | Champion | Runner_Up | Score | Attendance | Notes | |
|---|---|---|---|---|---|---|
| 9 | 1986 | Argentina | Germany | 3–2 | 114600 | NaN |
| 8 | 1990 | Germany | Argentina | 1–0 | 73603 | NaN |
| 7 | 1994 | Brazil | Italy | (3) 0–0 (2) | 94194 | Brazil won on penalty kicks following extra time |
| 6 | 1998 | France | Brazil | 0–3 | 80000 | NaN |
| 5 | 2002 | Brazil | Germany | 0–2 | 69029 | NaN |
| 4 | 2006 | Italy | France | (5) 1–1 (3) | 69000 | Italy won on penalty kicks following extra time |
| 3 | 2010 | Spain | Netherlands | 0–1 | 84490 | Required Extra Time |
| 2 | 2014 | Germany | Argentina | 1–0 | 74738 | Required Extra Time |
| 1 | 2018 | France | Croatia | 4–2 | 78011 | NaN |
| 0 | 2022 | Argentina | France | (4) 3–3 (2) | 88966 | Argentina won on penalty kicks following extra... |
sns.lineplot(data=worldcup_winner[::-1],x='Host',y='Attendance',color='#5900b3')
plt.title('Trend of Attendance in Final stage of World Cup based on Host Nation')
plt.xlabel('Host Nation')
plt.show()
sns.lineplot(data=worldcup_winner[::-1],x='Year',y='Attendance',color='#5900b3',marker='s')
plt.title('Trend of Attendance in Final stage of World Cup based on Years')
plt.xlabel('Year')
plt.show()
Highest attendance in finals in accordance to graph is in 1986 when host was Mexico even though we have previously seen that average attendance was highest in 1994 when host by USA. These average attendance being high in 1994 when USA was host can be related to sudden spike in attendance in finals when host by USA in 1994 as can be seen in graph.
matplotlib.rcParams['figure.figsize']=(20,5)
sns.lineplot(data=worldcup_winner_1[::-1], x='Host', y='Total_goals',color='#5900b3')
plt.title('Total goals in final game of every FIFA World Cup from 1986 to 2022 (Including Penalties)')
plt.xlabel(None)
plt.ylabel('Goals')
plt.show()
sns.lineplot(data=worldcup_winner_1[::-1], x='Year', y='Total_goals',marker='s',color='#5900b3')
plt.title('Total goals in final game of every FIFA World Cup (Including Penalties)')
plt.ylabel('Goals')
plt.show()
Sudden spike in goals in year 1994,2006,2022 can be explained due to fact that matches went to penalties as can be seen in data frame given below.
Trend of goals in final game is kind of all over the place but we can say it is increasing if consider the sudden spike are due to penalties.
worldcup_winner[['Year','Host','Champion','Score','Notes']]
| Year | Host | Champion | Score | Notes | |
|---|---|---|---|---|---|
| 0 | 2022 | Qatar | Argentina | (4) 3–3 (2) | Argentina won on penalty kicks following extra... |
| 64 | 2018 | Russia | France | 4–2 | NaN |
| 128 | 2014 | Brazil | Germany | 1–0 | Required Extra Time |
| 192 | 2010 | South Africa | Spain | 0–1 | Required Extra Time |
| 256 | 2006 | Germany | Italy | (5) 1–1 (3) | Italy won on penalty kicks following extra time |
| 320 | 2002 | Korea Republic, Japan | Brazil | 0–2 | NaN |
| 384 | 1998 | France | France | 0–3 | NaN |
| 448 | 1994 | United States | Brazil | (3) 0–0 (2) | Brazil won on penalty kicks following extra time |
| 500 | 1990 | Italy | Germany | 1–0 | NaN |
| 552 | 1986 | Mexico | Argentina | 3–2 | NaN |
sns.histplot(x=noha,color='#008080')
plt.title('Teams with most appearance in finals of World Cup')
plt.xlabel('Teams')
plt.ylabel('No of Appearances')
plt.show()
Teams with most appearance in finals are Argentina, France, Brazil. Each team has qualified for finals of World Cup for 4 times as can be seen in above graph.
wcw_df
| Team | Wins | |
|---|---|---|
| 0 | Argentina | 2 |
| 1 | France | 2 |
| 2 | Brazil | 2 |
| 3 | Germany | 1 |
| 4 | Spain | 1 |
| 5 | Italy | 1 |
| 6 | West Germany | 1 |
sns.barplot(data=wcw_df,x='Team',y='Wins',palette='crest_r')
plt.title('World Cup Winning Teams from 1986 to 2022')
plt.xlabel(None)
plt.show()
Teams with most World Cups from 1986 to 2022 are Argentina, France, Brazil. Each team has won World Cup 2 times as can be seen in above graph.

mhw_df
| Teams | Score | |
|---|---|---|
| 0 | Brazil | 29 |
| 1 | Germany | 26 |
| 2 | Argentina | 26 |
| 3 | France | 17 |
| 4 | Italy | 15 |
| 5 | Netherlands | 14 |
| 6 | Belgium | 13 |
| 7 | England | 12 |
| 8 | Portugal | 11 |
| 9 | Spain | 9 |
| 10 | Sweden | 6 |
| 11 | Korea Republic | 6 |
| 12 | Mexico | 5 |
| 13 | Uruguay | 5 |
| 14 | Colombia | 4 |
matplotlib.rcParams['figure.figsize'] = (15, 5)
sns.barplot(data=mhw_df,x='Score',y='Teams',orient='h',palette='crest_r',saturation=0.8)
plt.title('Teams with most wins as Home Team during the World Cup campaign of 1986 to 2022(Top 15)')
plt.xlabel('Wins')
plt.ylabel(None)
plt.show()
Most Successful home team is Brazil as they have won 29 times when regarded as Home team.
maw_df
| Teams | Score | |
|---|---|---|
| 0 | Spain | 14 |
| 1 | Germany | 11 |
| 2 | France | 11 |
| 3 | Brazil | 10 |
| 4 | Mexico | 9 |
| 5 | Croatia | 9 |
| 6 | Netherlands | 8 |
| 7 | England | 7 |
| 8 | Denmark | 6 |
| 9 | Italy | 6 |
| 10 | Switzerland | 6 |
| 11 | Uruguay | 6 |
| 12 | Belgium | 5 |
| 13 | Argentina | 5 |
| 14 | Senegal | 5 |
sns.barplot(data=maw_df,x='Score',y='Teams',orient='h',palette='crest_r',saturation=1)
plt.title('Teams with most wins as Away Team during the World Cup campaign of 1986 to 2022(Top 15)')
plt.xlabel('Wins')
plt.ylabel(None)
plt.show()
Most Successful away team is Spain as they have won 29 times when regarded as Home team.
tmw
| Team | No. of wins | |
|---|---|---|
| 0 | Brazil | 42 |
| 1 | Germany | 40 |
| 2 | Argentina | 37 |
| 3 | France | 30 |
| 4 | Spain | 24 |
| 5 | Netherlands | 23 |
| 6 | Italy | 22 |
| 7 | England | 20 |
| 8 | Belgium | 19 |
| 9 | Croatia | 17 |
| 10 | Mexico | 14 |
| 11 | Portugal | 13 |
| 12 | Uruguay | 12 |
| 13 | Denmark | 9 |
| 14 | Sweden | 9 |
ax=sns.barplot(data=tmw,y='Team',x='No. of wins',orient='h',palette='crest_r',saturation=0.9)
ax.set(title='Most Successful team during the World Cup campaign of 1986 to 2022 (Top 15)',ylabel=None,xlabel='Wins')
for p in ax.patches:
height=p.get_height()
width=p.get_width()
ax.text(x=width+0.1, y=p.get_y()+(height/2),s='{:.0f}'.format(width) , va='center')
plt.show()
Most Successful team during the World Cup from 1986 to 2022 is Brazil as they have win 42 games in whole campaign, won the World Cup 2 times, and have appeared in finals 4 times.
most_win_manager.head(20)
| Manager | Country_managed | Wins | |
|---|---|---|---|
| 0 | Luiz Felipe Scolari | Brazil | 16 |
| 1 | Didier Deschamps | France | 14 |
| 2 | Joachim Löw | Germany | 12 |
| 3 | Carlos Alberto Parreira | [South Africa,Brazil] | 11 |
| 4 | Zlatko Dalić | Croatia | 10 |
| 5 | Franz Beckenbauer | Germany | 10 |
| 6 | Óscar Tabárez | Uruguay | 10 |
| 7 | Carlos Bilardo | Argentina | 10 |
| 8 | Louis van Gaal | Netherlands | 9 |
| 9 | Guus Hiddink | [Australia, Korea Republic, Netherlands] | 8 |
| 10 | Jürgen Klinsmann | [United States,Germany] | 7 |
| 11 | Gareth Southgate | England | 7 |
| 12 | Roberto Martínez | Belgium | 7 |
| 13 | Henri Michel | [Côte d'Ivoire, Morocco, France] | 7 |
| 14 | Aimé Jacquet | France | 7 |
| 15 | Vicente del Bosque | Spain | 6 |
| 16 | Lionel Scaloni | Argentina | 6 |
| 17 | José Pekerman | Colombia | 6 |
| 18 | Alejandro Sabella | Argentina | 6 |
| 19 | Marcello Lippi | Italy | 6 |
matplotlib.rcParams['figure.figsize']=(20,8)
matplotlib.rcParams['font.size']=16
sns.barplot(y=most_win_manager.head(15).Manager,x=most_win_manager.head(15).Wins,palette='mako',saturation=0.8)
plt.ylabel('Managers')
plt.title('Most Successful Managers during the World-Cup champaign of 1986 to 2022 (Top 15)')
plt.show()
Manager Luiz Felipe Scolari is the most successful manager in World Cup from 1986 to 2022. He also has won the World Cup with Brazil one time with Top Scorer as Ronaldo (aka R9) whose was also managed by Luiz Felipe Scolari.
ccd_df.head(10)
| Captain | Team | Wins | |
|---|---|---|---|
| 0 | Hugo Lloris | France | 14 |
| 1 | Lionel Messi | Argentina | 13 |
| 2 | Diego Maradona | Argentina | 12 |
| 3 | Cafu | Germany | 11 |
| 4 | Philipp Lahm | Brazil | 11 |
| 5 | Luka Modrić | Croatia | 10 |
| 6 | Dunga | Germany | 9 |
| 7 | Lothar Matthäus | Brazil | 9 |
| 8 | Thiago Silva | Brazil | 9 |
| 9 | Harry Kane | France | 7 |
sns.barplot(y=ccd_df.head(15).Captain+' ('+ccd_df.head(15).Team+')',x=ccd_df.head(15).Wins,orient='h',
palette='mako',saturation=0.8)
plt.ylabel('Players as Captains (Team Captained)')
plt.title('Most Successful Captains during the World-Cup champaign of 1986 to 2022 (Top 15)')
plt.show()
Most Successful Captain during the World Cup Campaign of 1986 to 2022 is Hugo Lloris who plays for France. He also lead the team to win World Cup recently in Year 2018 when host was Russia as Captain of the team France.
wcw_df_1=worldcup_winner_df[['Year','Host','Champion','Winning_captain','Winning_manager','TopScorer','Goals']][::-1]
wcw_df_1.rename(columns={'Goals':'Topscorer_goals'},inplace=True)
wcw_df_1
| Year | Host | Champion | Winning_captain | Winning_manager | TopScorer | Topscorer_goals | |
|---|---|---|---|---|---|---|---|
| 9 | 1986 | Mexico | Argentina | Diego Maradona | Carlos Bilardo | Gary Lineker | 6 |
| 8 | 1990 | Italy | Germany | Lothar Matthäus | Franz Beckenbauer | Salvatore Schillaci | 6 |
| 7 | 1994 | United States | Brazil | Dunga | Carlos Alberto Parreira | Hristo Stoichkov, Oleg Salenko | 6 |
| 6 | 1998 | France | France | Didier Deschamps | Aimé Jacquet | Davor Šuker | 6 |
| 5 | 2002 | Korea Republic, Japan | Brazil | Cafu | Luiz Felipe Scolari | Ronaldo | 8 |
| 4 | 2006 | Germany | Italy | Fabio Cannavaro | Marcello Lippi | Miroslav Klose | 5 |
| 3 | 2010 | South Africa | Spain | Iker Casillas | Vicente del Bosque | Wesley Sneijder, Thomas Müller... | 5 |
| 2 | 2014 | Brazil | Germany | Philipp Lahm | Joachim Löw | James Rodríguez | 6 |
| 1 | 2018 | Russia | France | Hugo Lloris | Didier Deschamps | Harry Kane | 6 |
| 0 | 2022 | Qatar | Argentina | Lionel Messi | Lionel Scaloni | Kylian Mbappé | 8 |
wcw_df_2=wcw_df_1.astype({'Year':'str'})
wcw_df_2=wcw_df_2.sort_values('Topscorer_goals',ascending=False)
sns.barplot(y=wcw_df_2.Year+' | '+wcw_df_2.Host+' | '+wcw_df_2.TopScorer,x=wcw_df_2.Topscorer_goals,palette='mako')
plt.ylabel('Year | Host Nation | Top Scorer')
plt.xlabel('Goals')
plt.title('Top scorer goals based on Host Nation')
plt.show()
Over the years top scorer have been different from previous years because world cup occurs after every 4 year. Ronaldo (aka R9) and Kylian Mbappe both have scored 8 goals for their nation which is highest in history of World Cup as can be seen from the graph.
Golden boots is a award given to player who has scored most goals in World Cup in that year. So all Player/Top Scorer in the graph have won the golden boots in that particular year.
There are many conclusion which can be drawn using this dataset. The conclusion mentioned below are listed in sections.
Probability of World Cup Finals going in overtime or penalties is 50.00%.
Probability of World Cup Finals going in penalties is 30.00%.
Probability of World Cup Finals ending in overtime/extratime is 20.00%.
South Korea made history by becoming the first member of the Asian Football Confederation (AFC) to get to a FIFA World Cup semifinal.
At the 2022 FIFA World Cup, Morocco exceeded all predictions by winning their group, which also included the 2018 runners-up Croatia, and by defeating elite opponents like Belgium, Spain, and Portugal.
Morocco became the first country from Africa to ever get to the semifinals.
On June 22, 2002, South Korea defeated European giants Spain 5-3 (0-0) in the quarterfinal in penalty shootout, making history in the process.
The 2002 World Cup was the first and only time the competition was jointly staged by two nations, South Korea and Japan once.
The World Cup held in Qatar in 2022 was the first time when a Middle East Nation hosted and just the second time in Asia (the previous time was in 2002).
Number of unique teams/countries participated in world cup from the year 1986 to 2022 are: 77.
Teams with most appearance in World Cup is Germany with 58 matches played followed by Brazil with 57 and then Argentina with 54.
World Cup with Most Attendance was in year 1994 when host was USA with Total Attendance being 3,587,538 or 3.58 million.
World Cup with most Attendance per match was also in year 1994 when host was USA with Attendance per match was approximately 68,992 or 68K.
World Cup with most Attendance in Finals was in year 1986 when Mexico was the host with attendance in finals being 114,600 or 114K.
Most goals scored in Final of World Cup is 12 goals being scored in 2022 when Qatar was host. This finals was the most nerve-racking finals in the history of World Cup because of high level of competition, 12 goals were scored in a single match, and match going in overtime only to be decided by penalties.
One of the unluckiest teams at the World Cup is Mexico. They've lost more games than any other team in tournament history.
Team with most qualification for finals and team winning most World Cups were Argentina, Brazil, France. Each team appearing in finals for 4 times and winning 2 times.
In Conclusion the most Successful team during the World Cup campaign of 1986 to 2022 was Brazil as they won 42 games in whole campaign, won the World Cup 2 times, and have appeared in finals 4 times.
The biggest margin of victory in World Cup was achieved at the 2002 World Cup, jointly hosted by Japan and South Korea, when Germany defeated Saudi Arabia 8-0.
Manager Luiz Felipe Scolari is the most successful manager during the World Cup from 1986 to 2022.
He managed to win 16 games as manager during the whole campaign of 1986 to 2022.
He also won the World Cup with Brazil one time with Top Scorer as Ronaldo (Brazilian) who was also managed by him.
Most Successful Captain during the World Cup Campaign of 1986 to 2022 is Hugo Lloris who plays for France.
As a Captain he managed to win 14 games which highest as compared any other captain.
He also lead the team to win World Cup recently in Year 2018 when host was Russia as Captain of the team France.
Link to github repository - https://github.com/meetth77/jovian_practice
Link to kaggle website(from where data is taken)- https://www.kaggle.com/datasets/piterfm/fifa-football-world-cup
Link to Images:
Image 1:https://images8.alphacoders.com/128/1288503.jpg
Image 2:https://mir-s3-cdn-cf.behance.net/project_modules/max_1200/763bd417699179.562bdb4d800a4.jpg
Image 3:https://upload.wikimedia.org/wikipedia/commons/9/95/Poland_Senegal_2018-06-19_02.jpg
Image 6:https://i.pinimg.com/564x/4d/8d/a1/4d8da14badd023b10cbfd4d0df33825c.jpg
Here the data taken is from 1986 to 2022 ie last 10 World Cup so, in future I would consider the whole dataset that is from 1930 to 2022 and Analyze the whole data get insight from that data.
In future we can use the data of attendance in order to get revenue from ticket selling.
Also the data of attendance, home goals, away goals, home penalty, away penalty and top scorer can be used in order to predict upcoming World Cup goals and attendance and how successful will be the Upcoming World Cups.
Based on the data provided we can find out different Probability and Statistics related to World Cup which will be helpful in future predictions.
Also this data analytics project has future scope to Sport Analytics.
import jovian
jovian.commit()